Introduction to psychological research 2

19
WESTERBERG, VM Page 1 of 19 INTRODUCTION TO PSYCHOLOGICAL RESEARCH - 2 Research report extract: Friedenberg, M.A., & Kendler, B. (1999). Double-blind study of the possible proximity effect of sucrose on skeletal muscle strength. Perceptual and Motor Skills 89, 966-968. Student´s t test and chi square analyses. Westerberg, V.M. Date: 7 May 2010

description

Journal article review for psychological research university paper.

Transcript of Introduction to psychological research 2

Page 1: Introduction to psychological research 2

WESTERBERG, VM Page 1 of 13

Research report extract:

Friedenberg, M.A., & Kendler, B. (1999). Double-blind study of the possible proximity effect of sucrose on skeletal muscle strength. Perceptual and Motor Skills 89, 966-968.

Student´s t test and chi square analyses.

Westerberg, V.M.

Date: 7 May 2010

Page 2: Introduction to psychological research 2

WESTERBERG, VM Page 2 of 13

INTRODUCTION TO PSYCHOLOGICAL RESEARCH 2

Question 1

Critically evaluate the main conclusion of the study: Strengths, weaknesses and overall determination of the trustworthiness of the study.

The validity of the design of any experimental research study is a fundamental part of the scientific method. Without a valid design, valid scientific conclusions cannot be drawn.

Internal validity is an estimate of the degree to which conclusions about causal relationships can be made (e.g.: cause and effect), based on the measures used, the research setting and design. Experiments with high internal validity lead to trustworthy results.

There are factors that can strengthen or weaken internal validity. Among the former are: the use of a standardised scale, an adequate variable level number (≥2, otherwise it cannot vary), correct choice of magnitude levels and sensitive variable relationship (linear, curvilinear). Internal validity can be weakened by: Poor construct validity (inadequate cause-effect relationship), poor extraneous variable control (noise), sample bias, drop outs (not notified, not managed), measures invalidity / unreliability, statistical analysis inadequacy (lacking data, wrong calculations).

We are also asked to evaluate the main conclusion of the study. The statistical conclusion validity is the degree to which conclusions reached about relationships between variables are justified. This involves adequate internal validity, that is, ensuring adequate sampling procedures, appropriate statistical tests, and reliable measurement procedures. Conclusion validity is only concerned with whether there is any kind of relationship at all between the variables being studied as it may only be a correlation.

Construct validity involves the quality of choices made about the particular forms of the independent and dependent variables. These choices will affect the quality of research findings. Threats to construct validity can arise from the choice of treatment (the operationalization of the IV, and the delivery of the experiment), the choice of outcome measure (the operationalization of the DV, and the administration of the measurement).

Inadequate operationalization of the IV, like lack of reliability (measurements varying from measurement to measurement) obscures the relationship being studied. We have not been given the results of the measurements, and therefore it is not possible to assess

Page 3: Introduction to psychological research 2

WESTERBERG, VM Page 3 of 13

this point. The IV should be representative, the procedure must show an adequate operational representation of the theoretical construct of interest, and have a measurable impact on research participants.

Further threats to the internal validity of a study are experiment artifacts, those that arise from the presentation of the study to the participants or to the research setting: How motivated is the researcher / each participant?, are his/her instructions clear, is the environment adequate for the chosen experiment?, is there a language/ disability barrier?

Lack of control of extraneous and moderating variables can interfere with internal validity (i.e.: with the attempt to isolate causal relationships). Events outside the experiment or between repeated measures of the dependent variable (extraneous effects or history) may affect participants' responses to experimental procedures. They can affect participants' attitudes and behaviours in a way that it becomes impossible to determine whether any change on the dependent measures is due to the independent variable, or the historical event. One example is the nutritional status of participants. The study says that volunteers were in a fasting state and that they did not take any breakfast because the experiment was done relatively early in the morning. That is not a good enough reason to assume that they actually did not have any meal intake. If in fact, as presumed by the researchers, participants had been fasting, they would be weaker and may have found it difficult to hold the bottles, with the subsequent alteration of the outcome of the experiment in favour of muscle weakness. Pre and postprandial status could be moderating variables to consider if the test is repeated.

As the experiment progresses, as time passes, maturation takes place. This is not specific to particular events and range from growing hungrier, more tired, uninterested, discouraged, etc. These events may change the way a subject would react to the independent variable, so that upon completion of the study, the researcher may not be able to determine if the cause of the discrepancy is due to time or the independent variable.

The effects of taking this test repeatedly (practice effects) would not make participants any wiser; they cannot learn anything new from the previous hold that they can apply to the next one. Practice effects should not be expected in this test.

Researchers used a commercialised cable tensiometer to measure maximal isometric voluntary contraction (MIVC) of the biceps brachii for the operationalization (instrumentation) of the DV. Every measure should give stable results. If the variation in measurements is large, the usefulness of the measure would be compromised and so would the result of the study. Additionally, the sensitivity of the dependent variable should be enough to detect any possible relevant difference in outcome.

Page 4: Introduction to psychological research 2

WESTERBERG, VM Page 4 of 13

An isometric exercise means maintenance of muscle contraction for a period of time involving the action of a large number of different joints and muscle groups. The study claims that they were specifically measuring contraction of one single muscle, the biceps brachii. For the purpose of single-muscle evaluation, dynamometers have been developed that provide more specific, but still not conclusive, information on individual muscles. Detailed information about the characteristics of the tensiometer used have not been provided.

Additionally, values obtained from MVIC testing are difficult to interpret at present as normative data are limited. Another question is whether MVIC was performed using the Quantitative Muscle Assessment system and whether age and sex related reference values were calculated. The way age could alter the results would be that younger people are more resistant to fatigue. The way sex could influence the outcome of the test is because, for a given weight, men are stronger than women and more resistant to fatigue. When you are fatigued you have to work harder (contract the muscles more) to keep up with the workload. When fatigue is so intense that it weakens the individual, he / she may want to cease the activity and not be able to go through the test.

Looking at the choice of test to measure muscle strength, a tensiometer (or dynamometer) seems like an adequate one. Electromiography would be the ideal test to evaluate muscle strength, but it is not ethically acceptable for this purpose in humans. It would be expected that the tensiometer be well calibrated and display interval data of measurements. As for the scale of measurement, the International System of Measurement appears to have been applied but inconsistently. The containers (bottles) are said to be “45-gm bottles”. If they meant “grams”, they should have written “g” or better yet, using the SI base units, 0.045Kg. If the bottles were identical, they would have the same size, shape and colour, and ideally, for the purpose of this experiment, the same weight. But if the bottles had the same shape and size, they could hardly have had the same weight as the relative density of sand is highly heterogeneous depending on its constituents (types of minerals), degree of compaction and humidity, whereas sucrose density is homogeneous (δ=1.587 g/cm3 ). This means that the test results would not be attributable to close proximity of sucrose to the skeletal muscle, but to the difference in weight of the bottles. The results would account for the fact that the sand bottle is lighter, easier to lift, requiring less muscle tension which was mistakenly interpreted as sucrose proximity to skeletal muscle increases strength. Finally, bottle size should have been expressed in units of volume (litres or cm3), not in grams which is a unit of weight.

Instrumentation, changes in calibration, of a measurement tool or changes in the observers or scorers, may produce changes in the obtained measurements. No mention is made about this. If any instrumentation changes occurred or if the tensiometer became

Page 5: Introduction to psychological research 2

WESTERBERG, VM Page 5 of 13

disadjusted because of repeated use, the internal validity of the main conclusion would be affected, as alternative explanations could account for the results obtained.

We do not know, because it is not mentioned in the study, whether possible outliers took part in the test: Stronger or weaker than average (than the mean) individuals. This would affect the internal validity of the conclusion as the reduction in power could be attributable to the presence of outliers and not to sucrose proximity to skeletal muscle.

Drop out occurrence was not mentioned (were there any?, what are the characteristics of those who dropped out? At which point did they drop out?). If there were any, they could have influenced the results if it had been the weaker participants, like women, who dropped out, leaving the strongest volunteers to carry on with the test and who might consider that the bottle of sucrose was comparatively quite light, therefore being able to hold it with a lesser effort than weaker individuals. If dropping out leads to relevant bias between groups, then alternative explanations are possible that account for the observed differences.

Experimenter bias was considerably reduced through the use of a double blind study design, in which the experimenter is not aware of the condition to which each participant is exposed to. This is a good point found in this experiment. In a double-blind experiment, neither the individuals nor the researchers know who belongs to the control group and the experimental group. Only after all the data have been recorded (and in some cases, analyzed) do the researchers learn which individuals are which. Performing an experiment in double-blind fashion is a way to lessen the influence of the prejudices and unintentional physical cues on the results (the placebo effect, observer bias, and experimenter's bias). The key that identifies the subjects and which group they belonged to is kept by a third party and not given to the researchers until the study is over. Additionally, subjects were also blind to the hypothesis to avoid participant bias (e.g.: desire to prove / deny the research hypothesis).

A further positive point in this study is that the order of sucrose and sand presentations was counterbalanced across subjects, so that order effects, even though they are not removed, are taken into account and controlled.

The choice of study design, within-subject, is also adequate. Within-subject designs have more statistical power than between-subject designs. This kind of design requires fewer participants (good for a small sample like the one in this study) and offers better control of stable participant variables, like age and gender. The sand test was intended to be the control group in this study, but it really isn´t, because the result of the sand group test make the results of the sucrose test agree with the researchers hypothesis, reflected in the difference in muscle tension values of the sucrose group (the muscle tension recorded for the sucrose group is greater than that of the sand group, therefore researchers infer

Page 6: Introduction to psychological research 2

WESTERBERG, VM Page 6 of 13

that sucrose proximity to skeletal muscle affects muscle strength). Moreover, reasons should have been given regarding why they assumed that sand and not sucrose is inert. The assumption that the sucrose and the sand bottles weigh the same is incorrect. Both bottles should weigh the same, not just be the same size, shape and colour (see previous explanation about relative density differences).

The choice of statistical test is correct, Student´s t test, which evaluates if there is a difference between sample means and is it a chance effect or not. It is a test of significance. The study mentions that “a paired t test was used to analyse responses”, meaning that the study design is a within-group one.

The choice of alpha level, exploratory level, is correct and it indicates that researchers are ready to accept a type 1 error or taking a 5% chance of rejecting the null hypothesis when it is true. The way the statistical terms are formulated in the conclusion is inadequate. There is no need to mention units of measurement, like Kg, after the media values. The t value given lacks the degrees of freedom in parentheses. There should be no “vs” in between the results of the two tests. The conclusion should read like this:

The t test indicated that the mean maximal isometric voluntary contraction of the biceps brachii while holding the bottle of sucrose (M= 18.05, SD= 5.46) was significantly greater than when holding the bottle of sand (M= 17.86, SD= 5.27), t(27)=-2.08, p<0.05

Given the flaws mentioned in internal validity, one wonders if the calculations have been done correctly. They haven´t.

The critical t value (α = 0.05, two-tailed test, df= N-1=27) is 2.052.

The study has a within-subject design, as they mention that “a paired t test was used to analyse the results” and that “the significant level was set at p<.05”

The t observed value given is t= -2.08. According to the data provided, the t observed is 5.27.

tobs = Mtest-Mcontrol

(SDtest- SDcontrol) / √ N

18.05 – 17.86

(5.46-5.27) / √ 28

0.190.19 ¿ 5.29

Page 7: Introduction to psychological research 2

WESTERBERG, VM Page 7 of 13

0.19 = 5.270.036

tobs = 5.27

This is a positive value, in keeping with the first, therapy or sucrose, mean being larger than the second, control or sand, mean, implying that sucrose proximity affects muscle strength, but not negatively as the previous researches had show, but positively, increasing it. The current study research hypothesis only says that sucrose proximity affects muscle strength, not in which direction. The negative t value provided is misleading, apart from wrong.

tobserved > tcritical . This means that the null hypothesis may be rejected and that the results obtained are not likely to be due to a chance effect.

With regard to the main conclusion of the study, just emphasize that internal validity estimates the degree to which conclusions about causal relationships can be made based on the measures used, the research setting and the whole research design. A good experimental technique, in which the effect of an independent variable on a dependent variable is studied under highly controlled conditions, will allow for higher degrees of internal validity and for highly trustworthy results, whatever these may be. Even if the result concludes that the null hypothesis should be accepted, the test will have shown some light with regard to the hypothesis tested.

In view of the deficient procedure, wrong calculations and the weak internal validity of the test, results should be interpreted with caution and a re-test with a larger, more homogeneous sample and close observance of internal validity criteria and mathematical calculations are highly advisable.

Question 2

Imagine a hypothetical study that compared the muscle strength of a group of people in close proximity to sucrose to the muscle strength of another group of people who were in close proximity to sand (a placebo group).

Using the following hypothetical data carry out an appropriate t test to see if the group means are significantly different. The dependent variable is the same as in the journal article, with strength being measured in kilograms (kg).

Group 1 (proximity to sucrose) 18, 20, 17, 18, 19, 17, 19, 17, 19, 20 kg

Group 2 (proximity to sand) 18, 19, 17, 17, 15, 19, 16, 15, 17, 18 kg

Page 8: Introduction to psychological research 2

WESTERBERG, VM Page 8 of 13

Show all seven steps of null hypothesis testing. All mathematical calculations must be shown. Report the results as you would in a research report.

********

Use a two-tailed test and an alpha level of 0.05 (at those steps explain why those choices were made).

Step 1: State the hypotheses.

Η0: μ0 = μ1 . Null hypothesis: There is no difference in the mean skeletal muscle strength between the sucrose group and the sand group.

Η1: μ0≠ μ1. Research hypothesis: There is a difference in the mean skeletal muscle strength between the sucrose group and the sand group. (Note: In a two-tailed test the difference is non-directional, change can happen in any direction)

Step 2: Select an appropriate alpha level: Use 0.05

With an alpha (α) level of 0.05 it is easier to get significant results with small samples but there is a risk of making a Type I error, that is, the error of rejecting the null hypothesis when it is true. Accepting a Type I error, we say we are observing a difference when in truth there is none, thus indicating a test of poor specificity. A Type I error is a false positive and can be regarded as an error of excessive credulity.

Step 3: Select the correct statistical test.

If we are asked to determine if there is a statistically significant difference between two sample means, we will use a t test. The difference in the sample means must be such that will allow us to confidently say that it reflects a real difference in the population of interest, not attributable to a chance effect.

Types of t test:Unrelated (independent) t test – between subject design.Related (dependent or paired) t test – within subject design.Single sample t test – compares a group mean with some known value from a previous research or test.

The problem test has a between subject design and an unrelated (independent) t test should be used.

Step 4: Check the test statistic assumptions.

Page 9: Introduction to psychological research 2

WESTERBERG, VM Page 9 of 13

Interval or ratio data?

Interval is a measurement where the difference between two values is meaningful and follows a linear scale. For example: in physics, a temperature 0 degrees in any scale does not mean 'no temperature'; in biology, a pH of 0.0 does not mean 'no acidity'. Interval data is continuous data where differences are interpretable, ordered and follow a constant scale, but there is no “natural” zero meaning “absence of”. Examples are temperature, dates and pH.

Ratio is the relation in degree or number between two similar things or a relationship between two quantities, ordered, constant scale, with natural zero. Ratio data is interpretable. Ratio data has a natural zero. Examples are height, weight, age and length.

The problem study dependent variable measures strength in weight units (kilograms). Weight is ratio data.

Step 5: Calculate the critical value.

Alpha level = 0.05Two-tailed testDegrees of freedom (df) = (n1-1) + (n2-1) OR (n1 + n2) – 2

The degrees of freedom are the number of observations that are free to vary and supply independent bits of information. A critical t value table can be consulted to obtain the requested value.

(df) = (n1 + n2) – 2 = (10+10)-2 = 18 Critical t value (α= 0.05, two-tailed test, df=18)= ±2.101 (=±2.10)

tcritical=±2.10

(Note: The ± sign in the critical t value only indicates direction. For statistical purposes and to compare this value with the t observed , only the numerical or absolute value will be taken into account)

Step 6: Calculate the test statistic observed value.

tobs = tobserved , MT = media of the therapy (sucrose) group , MC = media of the control (sand) group, nT= sample size of the therapy (sucrose) group , nC= sample size of the control (sand) group SD = standard deviation, varT = variance (=SD2) of the therapy (sucrose) group, varC = variance (=SD2) of the control (sand) group.

(Note: Only the final value, the value of interest, in this case tobs will be rounded up to the second decimal so as to avoid excessive error deviations in calculations)

tobs = MT-MC SDpooled= √ [(varT – varC)/2] (when the 2 groups are

SDpooled √ (1/nT+1/nC) the same size)

Page 10: Introduction to psychological research 2

WESTERBERG, VM Page 10 of 13

18.4 – 17.1 SDpooled = √ [(SDT

2 – SDC

2)/2] =

SDpooled √ (1/nT+1/nC) =√ [(1.1732 – 1.4492)/2] = 1.318

1.31.318 √ (1/10+1/10)

1.31.318 √ (2/10)

1.31.318 √ 0.2

1.31.318 0.447

1.3 = 2.207 (= 2.21)0.589

tobs= 2.21

Let us now evaluate the results: tobs= 2.21 > tcrit= 2.10 Therefore we reject the null hypothesis and say there is a difference in the mean of the two groups that is not due to a chance effect.

Step 7: State the outcome of the test.

(Note: Values have been rounded up to the second decimal)

i.- For a formal report

The skeletal muscle strength of the sucrose group (M= 18.40, SD= 1.17) was significantly greater than the skeletal muscle strength of the sand group (M= 17.10, SD= 1.45), t(18), p<0.05 .

Therefore we reject the null hypothesis and say there is a difference in the mean of the two groups that is not due to a chance effect. The possibility of making a Type I error (say there is a difference in skeletal muscle strength when in fact there isn´t), is ≤ 5% which, for the purpose of this study, is acceptable.

ii.- In ordinary language.

Page 11: Introduction to psychological research 2

WESTERBERG, VM Page 11 of 13

The control group (the sand group) had an average skeletal muscle strength of 17.10 compared to an average skeletal muscle strength of 18.40 for the therapy group (sucrose group). These results speak in favour of the possibility that somehow sucrose proximity to skeletal muscle affects muscle strength, increasing it, and we believe that this result is not likely to have happened by chance.

Question 3Let us imagine that the machine recording muscle strength could only indicate if a person was "Strong" or "Not Strong". Strong people are those that can exert a force of 18kg or more. "Not Strong" people exert a force less than 18kg.

Using the interval data in Question 2 transform the participants into "Strong" and "Not Strong" categories, and then carry out an appropriate chi square test to see if there is a significant difference between the sucrose and sand groups. Again use an alpha level of 0.05.

Show all seven steps of null hypothesis testing. All mathematical calculations must be shown. Report the results as you would in a research report.

Explain any discrepancy between the results of the t test and the chi square test.

Step 1: State the hypotheses

Η0: f0 = fe Null hypothesis. The effect of sucrose in skeletal muscle strength equals the effect of sand in skeletal muscle strength.

Η1: f0≠ fe Research hypothesis. The effect of sucrose in skeletal muscle strength differs from the effect of sand in skeletal muscle strength.

Step 2: Select an appropriate alpha level: Use 0.05

With an alpha (α) level of 0.05 it is easier to get significant results with small samples but there is a risk of making a Type I error, that is, the error of rejecting the null hypothesis when it is true. Accepting a Type I error, we say we are observing a difference when in truth there is none, thus indicating a test of poor specificity. A Type I error is a false positive and can be regarded as an error of excessive credulity.

Step 3: Select the correct statistical test.

Frequency of occurrence of nominal (discrete) data (categories) : Chi square (χ2) test.

Step 4: Check the test statistic assumptions.

Page 12: Introduction to psychological research 2

WESTERBERG, VM Page 12 of 13

1. Frequency of occurrence (fo) of data (the number of observations per category)2. The observations must be independent (unrelated data), that is, there must be

different people in each category.3. Expected frequencies (fe) > 5 for every category. That is, there must be a sufficient

number of observations in each category. (After the calculations below, two categories do not meet this requirement, and therefore, results should be interpreted cautiously and a re-run of the test with a larger sample is recommended.)

Step 5: Calculate the critical value.

Alpha level = 0.05Two-tailed test

Degrees of freedom (df) = (#rows-1) . (#columns-1) = (2-1) . (2-1) = 1

The degrees of freedom are the number of observations that are free to vary and supply independent bits of information. A critical t value table can be consulted to obtain the requested value.

(df) = (n1 + n2) – 2 = (10+10)-2 = 18 Critical χ2 value (α= 0.05, two-tailed test, df=1)= 3.84

χ2 critical = 3.84

Step 6: Calculate the test statistic observed value.

fe= [(row total)(column total)]/grand totalχ2 observed = [(fo - fe)2]/ fe

CELL fo fe fo - fe (fo - fe)2 [(fo - fe)2]/ fe MarginalsSucrose strong

7 (11x10)/20=5.5

7-5.5 = 1.5 2.25 2.25/5.5= 0.41 0.41

Sucrosenot strong

3 (9x10)/20=4.5 3-4.5 = -1.5 2.25 2.25/4.5= 0.50 0.5

Sand strong 4 (11x10)/20=5.5

4-5.5 = -1.5 2.25 2.25/5.5= 0.41 0.41

Sand not strong

6 (9x10)/20=4.5 6-5.5 = 1.5 2.25 2.25/4.5= 0.50 0.5

Marginals 20 20 0 χ2 observed= 1.82

χ2 observed= 1.82 < χ2 critical = 3.84 Therefore, we accept the null hypothesis and say there is no difference in skeletal muscle strength between the sucrose group and the sand group.

Page 13: Introduction to psychological research 2

WESTERBERG, VM Page 13 of 13

The sample is too small. Two of the data cells had values <5, therefore these findings should be treated with caution as one of the statistical assumptions was not met.

Step 7: State the outcome of the test.

i.- For a formal report

There is no significant difference in skeletal muscle strength between the sucrose group (35%, N=7) and the sand group (20%, N=4), χ2 (1, N=20), p>0.05

The sample is too small. The larger the sample, the greater the likelihood is of getting a significant result. Additionally, two of the data cells had values <5, that is, one of the statistical assumptions was not met. Therefore the results should be treated with caution and it would be advisable to repeat the study with a larger sample. ii.- In ordinary language.

Despite the fact that 35% (7) of the participants showed increased strength in the presence of sucrose proximity to skeletal muscle compared with only a 20% (4) strength increase in the sand group, this result is probably due to a chance effect. However, the trend in the sample results is in favour of strength increase in the sucrose group and in view of the low sample size, it would be prudent to repeat the study with more participants.

Explain any discrepancy between the results of the t test and the chi square test.

The results of the t test speak in favour of sucrose having a moderate positive influence on skeletal muscle strength, whereas the results of the chi square test indicate that the difference may be due to a chance effect. Again, given the reduced number of participants a re-run of the test with a larger sample may offer more convincing evidence regarding the influence of sucrose proximity in skeletal muscle strength in one or another direction, taking into account that previous researches had shown an adverse effect on muscle strength of sucrose proximity to skeletal muscle.