• 追加された行はこの色です。
  • 削除された行はこの色です。
~
 However, if an inverse correlation is all the evidence you have, a second possibility exists.~
~
 Getting poor grades is frustrating; frustration often leads to escapist behavior; getting stoned is a popular means of escape; ergo, low grades cause marijuana smoking (G ~ M)! Unless you can establish which came first, smoking or low grades, this explanation is supported by the correlation just as plausibly as the first.~
~
 Let's introduce another variable into the picture: the existence and/or extent of emotional problems (variable E).~
~
 It could certainly be plausibly argued that having emotional problems may lead co escapist behavior, including marijuana smoking.~
~
 Likewise it seems reasonable to suggest that emotional problems are likely to adversely affect grades.~
~
 That correlation of marijuana smoking and low grades may exist for the same reason that runny noses and sore throats tend to go together-neither is the cause of the other, but rather, both are the consequences of some third variable ( Unless you can rule out such third variables, this explanation too is just as well supported by the data as is the first (or the second).~
~
 Then again, perhaps students smoke marijuana primarily because they have friends who smoke, and get low grades because they are simply not as bright or well prepared or industrious as their classmates, and the fact that it's the same students in each case in your sample is purely coincidental.~
~
 Unless your correlation is so strong and so consistent that mere coincidence becomes highly unlikely, this last possibility, while not supported by your data, is not precluded either.~
~
 Incidentally, this particular example was selected for two reasons.~
~
 First of aU, every one of the above explanations for such an inverse correlation has appeared in a national magazine at one time or another.~
~
 And second, every one of them is probably doomed tO failure because it turns out that, among college students, most studies indicate a direct correlation, that is, it is those with higher GPAs who are more likely to be marijuana smokers! Thus, with tongue firmly in cheek, we may reanalyze this particular finding: 1.~
~
 Marijuana relaxes a person, clearing away other stresses, thus allowing more effective study; hence, M-+ G.~
~
 or 2.~
~
 Marijuana is used as a reward for really hitting the books or doing well ("Wow, man! An 'A'! Let's go get high!"); hence, G-+ M.~
~
 or 3.~
~
 A high level of curiosity (E) is definitely an asset to learning and achieving high grades and may also lead one to investigate "taboo" substances; hence, E ::t ~- or 4.~
~
 Again coincidence, but this time the samples just happened to contain a lot of brighter, more industrious students whose friends smoke marijuana! The obvious conclusion is this: if all of these are possible explanations for a relationship between two variables, then no one of them should be too readily singled out.~
~
 Establishing that two variables tend to occur together is a necessary condition for demonstrating a causal relationship, but it is not by itself a sufficient condition.~
~
 It is a fact, for example, that human birthrates are higher in areas of Europe where there are lots of storks, but as to the meaning of that relationship .~
~
.~
~
.~
~
 !  To review, most social work researchers consider two variables to be causally related-that is, one causes the other-if (1) the cause precedes the effect in time, {2) there is an empirical correlation between them, and (3) the relationship between the two is not found to result from the effects of some third variable.~
~
 Any relationship that satisfies all of these criteria is causal, and these are the only criteria.~
~
  INTERNAL VALIDITY When we consider the extent to which a research study permits causal inferences to be made about relationships between variables, we again encounter the term validity.~
~
 You may recall that when we were discussing measurement validity in Chapter 8, we referred to validity as the extent to which a measure really measures what it intends to measure.~
~
 When discussing causal inference, however, the term is used differently.~
~
 Two forms of validity that are important when considering causality are internal validity and external validity.~
~
 Internal validity refers to the confidence we have that the results of a study accurately depict whether one variable is or is not a cause of another.~
~
 To the extent that the preceding three criteria for inferring causality are met, a study has internal validity.~
~
 Conversely, to the extent that we have not met these criteria, we are limited in our grounds for concluding that the independent variable does or does not play a causal role in explaining the dependent variable.~
~
 External validity refers to the extent to which we can generalize the findings of a study to settings and populations beyond the study conditions.~
~
 We will examine external validity later in this chapter, after we examine internal validity in some depth.~
~
 Let's begin that examination by discussing various threats to internal validity.~
~
 A threat to internal validity is present whenever anything other than the independent variable can affect the dependent variable.~
~
 When evaluating the effectiveness of programs or practice, for example, the problem of internal invalidity refers to the possibility that investigators might erroneously conclude that differences in outcome were caused by the evaluated intervention when, in fact, something else really caused the differences.~
~
 Campbell and Stanley (1963:5-6) and Cook and Campbell (1979:51-55) have identified various threats to internal validity.~
~
 Here are seven prominent ones: 1.~
~
 History.~
~
 During the course of the research, extraneous events may occur that will confound the results.~
~
 The term history is tricky.~
~
 The extraneous events need not be major news events that one would read about in a history book, but simply extraneous events that coincide in time with the manipulation of the independent variable.~
~
 For example, suppose a study evaluates the effectiveness of social services in improving resident morale in a nursing home merely by measuring the morale of a group of residents before and after they receive social services.~
~
 Perhaps some extraneous improvement in the nursing home environment-an improvement independent of the social services-was introduced between the before and after measures.~
~
 That possibility threatens the internal validity of the research because the extraneous improvement, rather than the independent variable (social services), might cause the hypothesized improvement in the dependent variable (morale).~
~
 2.~
~
 Maturation or the passage of time.~
~
 People continuously grow and change, whether they are a part of a research study or not, and those changes affect theresults of the research.~
~
 In the above nursing home illustration, for example, it would be silly to infer that the greater physical frailty of residents several years after receiving social services was caused by the social services.~
~
 Maturation, through the aging process, would represent a severe threat to the internal validity of such a conclusion.~
~
 But this threat to internal validity does not require that basic developmental changes occur; it can also refer simply to the effects of the passage of time.~
~
 Suppose, for example, that a study to evaluate the effectiveness of a crisis counseling program for victims of rape merely assessed the mood state or social functioning of the victims before and after treatment.~
~
 We might expect the rape victims' emotional moods or social functioning levels to be at their worst in the immediate aftermath of the trauma.~
~
 With or without crisis counseling at that point, we might expect the mere passage of time to alleviate some portion of the terrible impact of the trauma, even if we assume that the long-term effects will still be devastating.~
~
 Likewise, consider bereavement counseling: It would be silly also to conclude that, just because the functioning level or mood of clients whose loved one died immediately before counseling was somewhat better after counseling, the bereavement counseling must have caused the improvement.~
~
 3.~
~
 Testing.~
~
 Often the process of testing by itself will enhance performance on a test without any corresponding improvement in the real construct that the test attempts to measure.~
~
 Suppose we want to see whether a workshop helps social workers perform better on their state licensure exam.~
~
 We might construct a test that we think will measure the same sorts of things measured on the licensure exam and then administer that test to social workers before and after they take our workshop.~
~
 If their scores on the exam improve, then we might wish to attribute the improvement to the effects of our workshop.~
~
 But suppose the social workers, after taking the first test, looked up answers to test items before our workshop began and remembered those answers the next time they took the same test.~
~
 They would then score higher on the posttest without even attending our workshop, and we could not claim therefore that taking our workshop caused their scores to improve.~
~
 4.~
~
 Instrumentation changes.~
~
 If we use different measures of the dependent variable at posttest than we did at pretest, how can we be sure that they are comparable to each other? Suppose in evaluating the workshop to help social workers perform better on their state licensure exam we do not want workshop participants to take the same test twice (to avoid testing effects).~
~
 We might therefore construct two versions of the outcome test-one for the pretest and one for the posttest-that we think are equivalent.~
~
 Although we would like to conclude that our workshop caused any improvement in scores, it is conceivable that the real reason may have been that, despite our best efforts, the posttest version was an easier exam than the pretest version.~
~
 And if their scores worsened, rather than indicating that our workshop made them less well prepared for the exam, perhaps the posttest version was more difficult.~
~
 Analogous possibilities can occur when the measurement instruments involve ratings made by researchers or practitioners based on interviews with participants or observations of their behavior.~
~
 Perhaps the researchers or practitioners who provided the posttest ratings were not the same ones who provided the pretest ratings.~
~
 Perhaps one set of raters had different standards or abilities than the other set.~
~
 Even if the same raters are used at both pretest and posttest, their standards or their abilities may have changed over the course of the study.~
~
 Perhaps their skill in observing and recording behaviors improved as they gained more experience in doing so over the course of the study, enabling them to observe and record more behaviors at posttest than at pretest.~
~
 A more subtle type of change in instrumentation can occur in experimental studies of children.~
~
 If a long time passes from pretest to posttest, the child may have outgrown some of the pretest items.~
~
 For example, if a scale devised to measure the self-esteem of children aged 6 to 12 is administered to children aged 12 at pretest and then aged 15 three years later, the scale items may not have the same meaning to the participants as teenagers as they did to them as 12-year-olds.~
~
 5.~
~
 Statistical regression.~
~
 Sometimes it's appropriate to evaluate the effectiveness of services for clients who were referred because of their extreme scores on the dependent variable.~
~
 Suppose, for example, that a new social work intervention to alleviate depression among the elderly is being pilot tested in a nursing home among residents whose scores on a depression inventory indicate the most severe levels of depression.~
~
 From a clinical standpoint, it would be quite appropriate to provide the service to the residents who appear most in need of the service.~
~
 But consider from a methodological standpoint what is likely to happen to the depression scores of the referred residents even without intervention.~
~
 In considering this, we should be aware that, with repeated testing on almost any assessment inventory, an individual's scores on the inventory are likely to fluctuate somewhat from one administration to the next-not because the individual really changed, but because of the random testing factors that prevent instruments from ha,?ing perfect reliability.~
~
 For example, some residents who were referred because they had the poorest pretest scores may have had atypically bad days at pretest and may score better on the inventory on an average day.~
~
 Perhaps they didn't sleep well the night before the pretest, perhaps a chronic illness flared up that day, or perhaps a close friend or relative passed away that week.~
~
 When we provide services to only those people with the most extremely problematic pretest scores, the odds are that the proportion of service recipients with atypically bad pretest scores will be higher than the proportion of nonrecipients with atypically bad pretest scores.~
~
 Conversely, those who were not referred because their pretest scores were better probably include some whose pretest scores were atypically high (that is, people who were having an unusually good day at pretest).~
~
 Consequently, even without any intervention, the group of service recipients is more likely to show some improvement in its average depression score over time than is the group that was not referred.~
~
 There is a danger, then, that changes occurring because subjects started out in extreme positions will be attributed erroneously to the effects of the independent variable.~
~
 Statistical regression is a difficult concept to grasp.~
~
 It might aid your understanding of this term to imagine or actually carry out the following arousing experiment, which we have adapted from Posavac and Carey (1985).~
~
 Grab a bunch of coins-15 to 20 will suffice.~
~
 Flip each coin six times and record the number of heads and tails you get for each coin.~
~
 That number will be the pretest score for each coin.~
~
 Now, refer each coin that had no more than two heads on the pretest to a social work intervention that combines task-centered and behavioral practice methods.~
~
 Tell each referred coin (yes, go ahead and speak to it-but first make sure you are truly alone) that tails is an unacceptable behavior and that therefore its task is to try to come up heads more often.~
~
 After you give each its task, flip it six more times, praising it every time it comes up heads.~
~
 If it comes up tails, say nothing because you don't want to reward undesirable behavior.~
~
 .~
~
 Record, as a posttest, the number of heads and tails each gets.~
~
 Compare the total number of posttest heads with the total number of pretest heads for the referred coins.~
~
 The odds are that the posttest is higher.~
~
 Now flip the nonreferred coins six times and record their posttest scores, but say nothing at all to them.~
~
 We do not want them to receive the intervention; that way we can compare the pretest-posttest change of the coins that received the intervention to what happened among those coins that did not receive the intervention.~
~
 The odds are that the untreated coins did not show nearly as much of an increase in the number of heads as did the treated coins.~
~
 This experiment works almost every time.~
~
 If you got results other than those described here, odds are that if you replicated the experiment you would get such results the next time.~
~
 What do these results mean? Is task-centered and behavioral casework an effective intervention with coins? According to the scientific method, our minds should be open to this possibility-but we doubt it.~
~
 Rather, we believe these results illustrate that if we introduce the independent variable only to those referred on the basis of extreme scores, we can expect some improvement in the group solely because those scores will statistically regress to (which means they will move toward) their true score.~
~
 In this case, the coins tend to regress to their true score of three (50 percent) heads and three (50 percent) tails.~
~
 When the assessment is done on people, we can imagine their true score as being the mean score they would get if tested many rimes on many different days.~
~
 Of course, human behavior is more complex than that of coins.~
~
 But this illustration represents a common problem in the evaluation of human services.~
~
 Because we are most likely to begin interventions for human problems that are inherently variable when those problems are at their most severe levels, we can expect some amelioration of the problem to occur solely because of the natural peaks and valleys in the problem and not necessarily because of the interventions.~
~
 6.~
~
 Selection biases.~
~
 Comparisons don't have any meaning unless the groups being compared are really comparable.~
~
 Suppose we sought to evaluate the effectiveness of an intervention to promote positive parenting skills by comparing the level of improvement in parenting skills of parents who voluntarily agreed to participate in the intervention program with the level of improvement of parents who refused to participate.~
~
 We would not be able to attribute the greater improvement among program participants to the effects of the intervention-at least not with a great deal of confidence about the internal validity of our conclusionbecause other differences between the two groups might explain away the difference in improvement.~
~
 For example, the participants may have been more motivated than program refusers to improve and thus may have been trying harder, reading more, and doing any number of things unrelated to the intervention that may really explain why they showed greater improvement.~
~
 Selection biases are a common threat to the internal validity of social service evaluations because groups of service recipients and nonrecipients are often compared on outcome variables in the absence of prior efforts to see that the groups being compared were initially truly equivalent.~
~
 Perhaps this most typically occurs when individuals who choose to use services are compared with individuals who were not referred to those services or who chose not to utilize them.~
~
 7.~
~
 Ambiguity about the direction of causal influence.~
~
 As we discussed earlier in this chapter, there is a possibility of ambiguity concerning the time order of the independent and dependent variables.~
~
 Whenever this occurs, the research conclusion that the independent variable caused the changes in the dependent variable can be challenged with the explanation that the "dependent" variable actually caused changes in the "independent" variable.~
~
 Suppose, for example, a study finds that clients who completed a substance abuse treatment program are less likely to be abusing substances than those who dropped out of the program.~
~
 There would be ambiguity as to whether the program influenced participants not to abuse substances or whether the abstinence from substance abuse helped people complete the program.~
~
  PRE-EXPERIMENTAL PILOT STUDIES Not all evaluations of social work interventions strive to produce conclusive, causal inferences.~
~
 Some have an exploratory or descriptive purpose and thus can have considerable value despite having a low degree of internal validity.~
~
 Thus, when we say that a particular design has low internal validity, we are not saying that you should never use that design or that studies that do so never have value.~
~
 Suppose, for example, that your agency has initiated a new, innovative intervention for a small target group about which little is known.~
~
 It might be quite useful to find out whether clients' posttest scores are better (or perhaps worse!) than their pretest scores.~
~
 If the posttest scores are much better, then it is conceivable that the intervention is the cause of the improvement.~
~
 However, it would be inappropriate to make a conclusive causal inference to that effect because you haven't controlled for history, passage of time, and various other threats to internal validity.~
~
 Nevertheless, you have shown a correlation between time (pre versus post) and scores, and you have established time order in that the improved scores came after the intervention.~
~
 Such results, therefore, would provide a basis for supporting the plausibility that the intervention is effective and for testing its effectiveness further with a stronger (more internally valid) design.~
~
 Moreover, if you seek funding for a more ambitious study, your credibility to potential funding sources will be enhanced if you can include in your proposal for funding evidence that you were able to successfully carry out a pilot study and that its results were promising.~
~
 Sometimes, however, investigators report preexperimental studies as if they were valid tests of the effectiveness of an intervention.~
~
 Although they usually acknowledge the limitations of their pre-experimental designs, they sometimes draw conclusions that suggest to the unwary reader that the evaluated intervention is effective and should be considered evidence-based.~
~
 This is unfortunate, because-despite their value as pilot studies-pre-experimental designs rank low on the evidence-based practice research hierarchy due to their negligible degree of internal validity.~
~
 Let's now examine some common pre-experimental designs and consider why they have low internal validity.~
~
  One-Shot Case Study One particularly weak pre-experimental design, the one-shot case study, doesn't even establish correlation.~
~
 The shorthand notation for this design is X 0 The X in this notation represents the introduction of a stimulus, such as an intervention.~
~
 The 0 represents observation, which yields the measurement of the dependent variable.~
~
 In this design, a single group of research participants is measured on a dependent variable after the introduction of an intervention (or some other stimulus) without comparing the obtained results to anything else.~
~
 For instance, a service might be delivered and then the service recipients' social functioning measured.~
~
 This design offers no way for us to ascertain whether the observed level of social functioning is any higher (or lower!) than it was to begin with, or any higher (or lower!) than it is among comparable individuals who received no service.~
~
 Thus, this design-in addition to failing to assess correlation-fails to control for any of the threats to internal validity.~
~
  One-Group Pretest-Posttest Design A pre-experimental design that establishes both correlation and time-order-and which therefore has more value as a pilot study-is the one-grou pretest-posttest design.~
~
 This design assesses the dependent variable before and after the stimulus (intervention) is introduced.~
~
 Thus, in the evaluation of the effectiveness of social services, the design would assess the outcome variable before and after services are de~ livered.~
~
 The shorthand notation for this design is 0 1 X 0 2 The subscripts 1 and 2 in this notation refer to the sequential order of the observations; thus, 0 1 is the pretest before the intervention, and 0 2 is the posttest after the intervention.~
~
 Despite its value as a pilot study, this design does not account for factors other than the independent variable that might have caused the change between pretest and posttest results-factors usually associated with the following threats to internal validity: history, maturation, testing, and statistical regression.~
~
 Suppose, for example, that we assess the attitudes of social work students about social action strategies of community organization-strategies that emphasize tactics of confrontation and conflict (protests, boycotts, and so on)-before and at the end of their social work education.~
~
 Suppose we find that over this time they became less committed to confrontational social action strategies and more in favor of consensual community development approaches.~
~
 Would such a finding permit us to infer that the change in their attitude was caused by their social work education? No, it would not.~
~
 Other factors could have been operating during the same period and caused the change.~
~
 For instance, perhaps the students matured and became more tolerant of slower, more incremental strategies for change (the threat to internal validity posed by maturation or the passage of time).~
~
 Or perhaps certain events extraneous to their social work education transpired during that period and accounted for their change (the threat of history).~
~
 For example, perhaps a series of protest demonstrations seemed to backfire and contribute to the election of a presidential candidate they abhorred, and their perception of the negative effects of these demonstrations made them more skeptical of social action strategies.~
~
 In another example of the one-group pretestposttest design, suppose we assess whether a threemonth cognitive-behavioral intervention with abusive parents results in higher scores on a paperand- pencil test of parenting skills and cognitions about childhood behaviors.~
~
 In addition to wondering whether history and maturation might account for the improvement in scores, we would wonder about statistical regression.~
~
 Perhaps the parents were referred for treatment at a time when their parental functioning and attitudes about their children were at their worst.~
~
 Even if the parenting skills and cognitions of these parents were quite unacceptable when they were at their best, improvement from pretest to posttest might simply reflect the fact that they were referred for treatment when they were at their worst and that their scores therefore couldn't help but increase somewhat because of regression toward their true average value before intervention.~
~
  Posttest-Only Design with Nonequivalent Groups (Static-Group Comparison Design) A third pre-experimental design is the posttest-only design with nonequivalent groups.~
~
 The shorthand notation for this design, which has also been termed the static-group comparison design, is X 0 0 This design assesses the dependent variable after the stimulus (intervention} is introduced for one group, while also assessing the dependent variable for a second group that may not be comparable to the first group and that was not exposed to the independent variable.~
~
 In the evaluation of the effectiveness of social services, this design would entail assessing clients on an outcome variable only after (not before} they receive the service being evaluated and comparing their performance with a group of clients who did not receive the service and who plausibly may be unlike the treated clients in some meaningful way.~
~
 Let's return, for example, to the preceding hypothetical illustration about evaluating the effec・ tiveness of a cognitive-behavioral intervention with abusive parents.~
~
 Using the posttest-only design with nonequivalent groups rather than comparing the pretest and posttest scores of parents who received the intervention, we might compare their posttest scores to the scores of abusive parents who were not referred or who declined the intervention.~
~
 We would hope to show that the treated parents scored better than the untreated parents, because this would indicate a desired correlation between the independent variable (treatment status} and the dependent variable (test score}.~
~
 But this correlation would not permit us to infer that the difference between the two groups was caused by the intervention.~
~
 The most important reason for this・ is the design's failure to control for the threat of selection biases.~
~
 Without pretests, we have no way of knowing whether the scores of the two groups would have differed as much to begin with-that is, before the treated parents began treatment.~
~
 Moreover, these two groups may not really have been equivalent in certain important respects.~
~
 The parents who were referred or who chose to participate may have been more motivated to improve or may have had more supportive resources than those who were not referred or who refused treatment.~
~
 It is not at all uncommon to encounter program providers who, after their program has been implemented, belatedly realize that it might be useful to get evaluative data on its effectiveness and are therefore attracted to the posttest-only design with nonequivalent groups.~
~
 In light of the expedience of this type of design, and in anticipation of the practical administrative benefits to be derived from positive outcome findings, the providers may not want to hear about selection biases and low internal validity.~
~
 But we nevertheless hope you will remember the importance of these issues and tactfully discuss them with others when the situation calls for it.~
~
 Figure 10-1 graphically illustrates the three preexperimental research designs just discussed.~
~
 See if you can visualize where the potentially confounding and misleading factors could intrud~ into each design.~
~
 As we noted earlier, some studies using preexperimental designs can be valuable despite their extremely limited degree of internal validity.~
~
 What tends to make them valuable is their use as pilot studies purely for one or more of the following purposes: (1} To generate tentative exploratory or descriptive information regarding a new intervention about which little is known; (2} to learn whether it is feasible to provide the new intervention as intended; (3} to identify obstacles in carrying out methodological aspects of a more internally valid design that is planned for the future; and (4) to see if the hypothesis for a more rigorous study remains plausible based on the pilot study results.~
~
 The box "A Scientific Study and a Pseudoscientific Study Using Pre-Experimental Designs" pro'Vides two examples of published studies that used pre-experimental 1.~
~
 THE ONE-SHOT CASE STUDY Administer the experimental stimulus to a single group and measure the dependent variable in that group afterward.~
~
 Make an intuitive judgment as to whether the posttest result is "h1gh" or "low.~
~
" q Comparison Stimulus Posttes TIME Intuitive estimate of ・normal" level of the dependent variable 2.~
~
 THE ONE-GROUP PRETEST-POSTTEST DESIGN Measure the dependent variable in a single group, administer the experimental stimulus, and then remeasure the dependent variable.~
~
 Compare pretest and posttest results.~
~
 ~Comparison Pretest Stimulus Posttest TIME 3.~
~
 THE STATIC-GROUP COMPARISON ~ Admin1ster the experimental stimulus to one group (the expenmental group).~
~
 then measure the dependent variable n both the experimental group and a comparison group Experimental group Comparison group Stimulus Posttest TIME >-Comparison Posttest Figure 10-1 Three Pre-experimental Research Designs designs.~
~
 The first illustrates a valuable pilot study that meets all of the foregoing four criteria.~
~
 The second meets none of them.~
~
 Now that we've discussed both the utility and the limitations of preexperimental designs, let's examine some designs that have higher levels of internal validity.~
~
  EXPERIMENTAL DESIGNS Experimental designs attempt to provide maximum control for threats to internal validity by first randomly assigning research participants to experimental and control groups.~
~
 Next, they introduce one category of the independent variable {such as a new program or intervenrion meth-od) to the experimental group while withholding it from the control group.~
~
 Then they compare the extent to which the experimental and control groups differ on the dependent variable.~
~
 The latter comparison, though coming at the end of the sequence, usually involves assessing the experimental and control groups before and after introducing the independent variable.~
~
 For example, suppose we wanted to assess the effectiveness of an intervention used by gerontological social workers in nursing home facilities, an intervention that engages clients in a review of their life history in order to alleviate depression and improve morale.~
~
 Rather than just compare residents who requested the intervention with those who did not-which would be vulnerable to a selection bias because we could not assume the two groups were equivalent to begin with-our experimental approach would use a random assignment procedure (such as coin tosses).~
~
 Thus, each resident who agrees to participate and for whom the intervention is deemed appropriate would be randomly assigned to either an experimental group (which would receive the intervention) or a control group {which would not receive it).~
~
 Observations on one or more indicators of depression and morale {the dependent variables) would be taken before and after the intervention is delivered.~
~
 To the extent that the experimental group's mood improves more than that of the control group, the findings would support the hypothesis that the intervention causes the improvement.~
~
  EXPERIMENTAL GROUP Measure Dependent Variable Administer Experimental Stimulus Remeasure Dependent Variable EXPERIMENTAL DESIGNS 25 3 Compare: Same? Compare: Different? CONTROL GROUP Measure Dependent Variable Remeasure Dependent Variable Figure 10-2 Diagram of Basic Experimental Design  A SCIENTIFIC STUDY AND A PSEUDOSCIENTIFIC STUDY USING PRE-EXPERIMENTAL DESIGNS  A Scientific Pilot Study Nancy Grote and her social work faculty colleagues wanted to find out if a briefer and more culturally relevant form of interpersonal psychotherapy (IPT)-an evidence-based treatment for depression-would be feasible to provide to depressed pregnant patients on low incomes (most of whom were African American} in an obstetric and gynecological clinic.~
~
 They postulated that modifying the ITP treatment protocol would help overcome barriers to service use among low-income and minority patients, such as cost, transportation, child care, cultural insensitivity, and so on.~
~
 Twelve pregnant depressed women were recruited to be in the study, and nine of them completed the eight sessions of treatment.~
~
 Those nine women displayed significant improvement at posttreatment and six months later on measures of depression, anxiety, and social functioning.~
~
 The authors concluded that their preliminary results suggested that: (1} The modified IPT approach appears to be feasible for depressed pregnant patients on low incomes; and (2) the hypothesis that the modified ITP treatment approach is effective in treating depression among such patients is sufficiently supported to warrant testing in a subsequent, more internally valid, experiment.~
~
 One admirable feature of their published study was the way the authors discussed its limitations, noting that their sample was small and perhaps atypical and that they lacked experimental design components that would have controlled for various threats to internal validity.~
~
 Also admirable was the way they did not go beyond their exploratory pilot study aims in interpreting their outcome data.~
~
 Unlike some authors who report studies using pre-experimental designs, Grote and her colleagues avoided making conclusive causal inferences about the effectiveness of their intervention.~
~
 Source: Grote, N.~
~
 K.~
~
, Bledsoe, S.~
~
 E.~
~
, Swartz, H.~
~
 A.~
~
 and Frank, E.~
~
 2004.~
~
 "Feasibility of Providing Culturally Relevant, Brief Interpersonal Psychotherapy for Amenatal Depression in an Obstetrics Clinic: A Pilot Study.~
~
" Research on Social Work Practice, 14, 6, 397-407.~
~
  A Pseudoscientific Study Not all authors of reports reporting pilot studies interpret their results as cautiously as did Grote and her colleagues.~
~
 Nor do all reports of studies using pre-experimental designs claim to have a pilot study purpose.~
~
 Authors of such reports often interpret their results inappropriately, making conclusive causal inferences that are unwarranted in light of their design's weak internal validity.~
~
 Such was the case in an evaluation of the effectiveness of Thought Field Therapy (TFT) by Johnson et al.~
~
 (2001}.~
~
 As you read this summary, we suggest you recall our portrayal of pseudoscience back in Chapter 1.~
~
 The egregious claims of extravagant treatment success in this one-shot case study have led critics to call it pseudoscience.~
~
 TFT is an extremely controversial therapy that uses "mechanical stimulation of points on the body's energy meridians, plus bilateral optical cortical stimulation" in an effort to relieve psychological trauma symptoms (p.~
~
 1239}.~
~
 In the aftermath of massacre and torture by invading Serbian forces in 1999, TFT was provided to 105 traumatized ethnic Albanian survivors in Kosovo.~
~
 There was no pretest.~
~
 Assessment was based on whether recipients of TFT said their trauma was or was not gone after receiving TFT.~
~
 The report does not specify whether they said this to the TFT therapist or someone else, but in either case the serious potential for social desirability bias in the self-reporting to foreigners who volunteered to help them is compounded by the utter lack of any attempts to control for threats to internal validity.~
~
 But rather than discuss their results cautiously in terms of assessing the feasibility of their treatment or design protocol, or generating hypotheses for future research, the authors claimed that the treatment was "clearly effective" for all but two of the 105 patients who received it, and all but two of their 249 separate traumas! We are not aware of any well-controlled, internally valid evaluation of a psychosocial intervention-including those interventions widely accepted as having the best empirical support and being the most effective-that has had results anywhere near that extreme.~
~
 The journal that published this study inserted a footnote on the article's first page.~
~
 The footnote says that the article was not peer reviewed.~
~
 Instead, the journal editor agreed to publish it after the discoverer of TFT complained that the journal's review process was biased against TFT.~
~
 But the editor agreed to publish it only if it was followed by another article that critically appraised the TFT study.~
~
 The criticism in that accompanying article was scathing! Source: johnson, C.~
~
, Shala, M.~
~
, Sedjejaj, X.~
~
, Odell, R.~
~
, and Dabishevci, K.~
~
 (2001).~
~
 "Thought Field TherapySoothing the Bad Moments of Kosovo.~
~
" journal of Clinical Psychology, 57, 10, 1237-1240.~
~
  The preceding example illustrates the classic experimental design, also called the pretest-posttest control group design.~
~
 This design is diagrammed in Figure 10-1.~
~
 The shorthand notation for this design is R R The R in this design stands for random assignment of research participants to either the experimental group or the control group.~
~
 0 1 represents pretests, and 0 2 represents posttests.~
~
 The X represents the tested intervention.~
~
 Notice how this design controls for many threats to internal validity.~
~
 If the improvement in mood were caused by history or maturation, then there would be no reason the experimental group should improve any more than the control group.~
~
 Likewise, because the residents were assigned on a randomized basis, there is no reason to suppose that the experimental group was any more likely to statistically regress to less extreme scores than was the control group.~
~
 Random assignment also removes any reason for supposing that the two groups were different initially with respect to the dependent variable or to other relevant factors such as motivation or psychosocial functioning.~
~
 The box "A Social Work Experiment Evaluating the Effectiveness of a Program to Treat Children at Risk of Serious Conduct Problems" summarizes a published social work experiment study that used a pretest-posttest control group design.~
~
 Notice also, however, that the pretest-posttest control group design does not control for the possible effects of testing and retesting.~
~
 If we think that taking a pretest might have an impact on treatment effects, or if we think that it might bias their posttest responses, then we might opt for an experimental design called the posttest-only control group design.~
~
 Another, more common, reason for choosing the posttest-only control group design is that pretesting may not be possible or practical, such as in the evaluation of the effectiveness programs to prevent incidents of child abuse.~
~
 The shorthand notation for this design is R X 0 R 0 This design assumes that the process of random assignment removes any significant initial differences between experimental and control groups.~
~
 This assumption of initial group equivalence permits the inference that any differences between the two groups at posttest reflect the causal impact of the independent variable.~
~
 The box "A Social Work Experiment Evaluating Motivational Interviewing" summarizes a published study that used a posttest-only control group experimental design.~
~
 If we would like to know the amount of pretestposttest change but are worried about testing effects,  A SOCIAL WORK EXPERIMENT EVALUATING THE EFFECTIVENESS OF A PROGRAM TO TREAT CHILDREN AT RISK OF SERIOUS CONDUCT PROBLEMS  Mark Fraser and his associates evaluated the effectiveness of a multi-component intervention to treat children referred by teachers for aggressive antisocial behavior and rejection by their prosocial peers.~
~
 The children were assigned randomly to an experimental group or a wait-list control group.~
~
 The experimental group children participated in a social skills training program in after-school or school settings, and their parents or caretakers participated in an in-home family intervention program designed to increase parenting skills.~
~
 The control group children "continued to participate in any routine services they may have [already] been receiving.~
~
" At the conclusion of the study, the children and parents in the control condition were offered the same intervention package as the experimental group participants received.~
~
 Outcome was measured by having teachers complete a form at pretest and posttest on which they rated each child's behaviors in classroom and play environments.~
~
 The results showed the experimental group children had significantly more improvement than control group children on ratings of prosocial behavior, ability to regulate emotions, and increased social contact with peers.~
~
 Source: Fraser, M.~
~
, Day, S.~
~
 H.~
~
, Galinsky, M.~
~
 J .~
~
, Hodges, V.~
~
 G.~
~
, and Smokowski, P.~
~
 R.~
~
 2004.~
~
 "Conduct Problems and Peer Rejection in Childhood: A Randomized Trial of the Making Choices and Strong Families Programs," Research on Social Work Practice, 14, 5, 313-324.~
~
  A SOCIAL WORK EXPERIMENT EVALUATING MOTIVATIONAL INTERVIEWING  Robert Schilling and his colleagues evaluated a motivational interviewing intervention designed to encourage detoxified alcohol users to participate in self-help groups after alcohol detoxification.~
~
 Ninety-six clients were randomly assigned to either a three-session motivational interviewing condition or a standard care condition.~
~
 Motivational interviewing is directive but uses client-centered relationship skills (such as being empathic, warm, and genuine) in providing information and feedback to increase client awareness of the problem and consideration of change by helping clients see the discrepancy between their problematic behavior and their broader goals.~
~
 Outcome was assessed two months after discharge from inpatient care via self-reported attendance at self-help meetings and drinking behavior.~
~
 Although motivational interviewing currently is widely accepted as an evidence-based intervention, this study's results were portrayed by its authors as somewhat disappointing.~
~
 No differences in drinking behavior were found between the experimental and control groups; however, motivational interviewing recipients averaged twice as many days participating in 12-step self-help groups.~
~
 Source: Schilling, R.~
~
 F.~
~
, El-bassel, N.~
~
, Finch,J.~
~
 B.~
~
, Roman, R.~
~
J.~
~
, and Hanson, M.~
~
 2002.~
~
 "Motivational Interviewing to Encourage Self-Help Participation Following Alcohol Detoxilication," Research on Social Work Practice, 12, 6, 711-730.~
~
  then we could use a fancy design called the Solomon four-group design.~
~
 The shorthand notation for this design is R R R R X This design, which is highly regarded by research methodologists but rarely used in social work studies, combines the classical experimental design with the posttest-only control group design.~
~
 It does this simply by randomly assigning research participants to four groups instead of two.~
~
 Two of the groups are control groups, and two are experimental groups.~
~
 One control group and one experimental group are pretested and posttested.~
~
 The other experimental and control group are posttested only.~
~
 If special effects are caused by pretesting, then they can be discerned by comparing the two experimental group results with each other and the two control group results with each other.~
~
 Sometimes experiments are used to compare the effectiveness of two alternative treatments.~
~
 Pretests are recommended in such experiments so that the comparative amounts of change produced by each treatment can be assessed.~
~
 This design is called the alternative treatment design with pretest (Shadish, Cook, and Campbell, 2001).~
~
 The shorthand notation for this design is The first row above represents the participants randomly assigned to Treatment A.~
~
 The second row represents the participants randomly assigned to Treatment B.~
~
 The third row represents the participants randomly assigned to a control group.~
~
 To show that Treatment A is more effective than Treatment B, the first row would need to show more improvement from 0 1 to 0 2 than both of the other rows.~
~
 If the first two rows both show approximately the same amounts of improvement, and both amounts are more than in the third row, that would indicate that both treatments are approximately equally effective.~
~
 But if the third row shows the same degree of improvement as in the first two rows, then neither treatment would appear to be effective.~
~
 Instead, we would attribute the improvement in all three rows to an alternative explanation such as history or the passage of time.~
~
 Some experiments use the first two rows of this design but not the third row.~
~
 In other words, they compare the two treatments to each other but not to a control group.~
~
 Such experiments can have conclusive, valid findings if oge group improves significantly more than the other.~
~
 But suppose they both have roughly the same amount of improvement.~
~
 The temptation would be to call them equally effective.~
~
 However, with no control group, we cannot rule out threats to internal validity, such as history or the passage of time, as alternative explanations of the improvement in both groups.~
~
 An illustrative study is summarized in the box "A Social Work Experiment Comparing the Effectiveness of Two Approaches to Spouse Abuse Treatment.~
~
" A similar type of design can be used to see not only whether an intervention is effective, but also which components of the intervention may or may not be necessary to achieve its effects.~
~
 Experiments using this design are called dismantling studies.~
~
 The shorthand notation for this design is The first row above represents the participants randomly assigned to a treatment that contains components A and B.~
~
 The second row represents the participants randomly assigned to receive the A component only.~
~
 The third row represents the participants randomly assigned to receive the B component only.~
~
 The fourth row represents the participants randomly assigned to a control group.~
~
 If the first row shows more improvement from 0 1 to 0 2 than all of the other rows, then that would indicate that the treatment is effective and that both components (A and B) are needed.~
~
 If either of the next two rows shows as much improvement as the first row shows, then that would indicate that the component signified in that row is all that is needed to achieve the effects shown in the first row, and that the other component may not be needed.~
~
  A SOCIAL WORK EXPERIMENT COMPARING THE EFFECTIVENESS OF TWO APPROACHES TO COURTMANDATED SPOUSE ABUSE TREATMENT  For his dissertation, Stephen Brannen conducted an experiment with intact couples desiring to stay in their current relationship who were referred by a court in San Antonio for spouse abuse treatment.~
~
 The couples were assigned to one of two approaches to group cognitive-behavioral therapy.~
~
 In one approach the couples participated together in the group; in the other approach they participated apart in separate groups for each gender.~
~
 Outcome was measured using standardized self-report scales that measured the couples' conflict resolution ability, the level of violence in the relationship, the level of communication and marital satisfaction within the relationship, and recidivism.~
~
 Data were collected both from the victims and perpetrators of the abuse.~
~
 Significant pretest to posttest gains were found for both groups; however, there was no difference between the groups.~
~
 Consequently, although the findings were consistent with the notion that both interventions are about equally effective, without a no-treatment or routine treatment control group Brannen could not rule out threats like history or the passage of time as possible causes of the improvement.~
~
 Source: Brannen, S.~
~
 J.~
~
 and Rubin, A.~
~
 1996.~
~
 "Comparing the Effectiveness of Gender-Specific and Couples Groups in a Court-Mandated Spouse Abuse Treatmem Program," Research on Social Work Practice, 6, 4, 405-424.~
~
  Dismantling studies have received considerable attention in recent debates about the comparative effectiveness of two evidence-based interventions for post-traumatic stress disorder (PTSD).~
~
 One intervention is trauma-focused cognitive behavioral therapy (TFCBT) and the other is eye movement desensitization and reprocessing (EMDR).~
~
 Briefly put, EMDR combines various cognitive-behavioral intervention techniques with a technique involving the bilateral stimulation of rapid eye movements.~
~
 Many experiments using the pretest-posttest control group design have found EMDR to be effective.~
~
 However, some dismantling studies found that the same effects could be achieved using the cognitivebehavioral components of EMDR without the eye movements.~
~
 Some reviewers of the dismantling studies have argued that those dismantling results indicate that EMDR is nothing more than TFCBT with an unnecessary bilateral stimulation gimmick.~
~
 Leaders in the EMDR field have fiercely criticized the dismantling studies, and cited other dismantling studies with results that supported the necessity of the bilateral stimulation component.~
~
 Perhaps vested interests and ego involvement influence those on both sides of this debate (Rubin, 2010; Rubin, 2002).~
~
 The box "A Social Work Experiment Evaluating Cognitive-Behavioral Interventions with Parents at Risk of Child Abuse" illustrates the use of a dismantling study in S<?Cial work.~
~
  Randomization It should be clear at this point that the cardinal rule of experimental design is that the experimental and control groups must be comparable.~
~
 Ideally, the control group represents what the experimental group would have been like had it not been exposed to the intervention or other experimental stimulus being evaluated.~
~
 There is no way to guarantee that the experimental and control groups will be equivalent in all relevant respects.~
~
 There is no way to guarantee that they will share exactly the same history and maturational processes or will not have relevant differences before the evaluated intervention is introduced.~
~
 But there is a way to avoid biases in the assignment of clients to groups and to guarantee a high mathematical likelihood that their initial, pretreatment group differences will be insignificant: through random assignment to experimental and control groups, a process also known as randomization.~
~
 Randomization, or random assignment, is not the same as random sampling.~
~
 The research  A SOCIAL WORK EXPERIMENT EVALUATING COGNITIVE-BEHAVIORAL INTERVENTIONS WITH PARENTS AT RISK OF CHILD ABUSE  Whiteman, Fanshel, and Grundy (1987) tested the effectiveness of 'different aspects of a cognitive-behavioral intervention aimed at reducing parental anger in the face of perceived provocation by children in families in which child abuse had been committed or in families at risk for child abuse.~
~
 Fifty-five clients were randomly assigned to four intervention groups and a control group that received no experimental intervention but instead continued to receive services from the referral agency.~
~
 The first intervention group received cognitive restructuring interventions that dealt with the parents' perceptions, expectations, appraisals, and stresses.~
~
 The second intervention group was trained in relaxation procedures.~
~
 The third intervention group worked on problem-solving skills.~
~
 The fourth intervention group received a treatment package comprising the three interventional modalities delivered separately to the first three intervention groups.~
~
 The results revealed no significant differences among the experimental and control groups at pretest.~
~
 At posttest, however, the treated (experimental group) participants had significantly greater reductions in anger than the untreated (control group) participants.~
~
 The intervention group with the greatest reduction in anger was the one that received the composite package of interventions delivered separately to the other three intervention groups.~
~
 In light of their findings, Whiteman and associates recommended that social workers use the composite intervention package to attempt to reduce anger and promote positive child-rearing attitudes among abusive or potentially abusive parents.~
~
 Their results also indicated the importance of the problem-solving skills component in reducing anger, the importance of the cognitive restructuring component in improving childrearing attitudes, and the relative unimportance of including the relaxation component in the intervention package.~
~
 Source: Whiteman, Marrin, David Fanshel, and John F.~
~
 Grundy.~
~
 1987.~
~
 "Cognitive-Behavioral Interventions Aimed at Anger of Parents at Risk of Child Abuse," Social Work, 32(6), 469-474.~
~
  participants to be randomly assigned are rarely randomly selected from a population.~
~
 Instead, they are individuals who voluntarily agreed to participate in the experiment, a fact that limits the external validity of the experiment.~
~
 Unlike random sampling, which pertains to generalizability, randomization is a device for increasing internal validity.~
~
 It does not seek to ensure that the research participants are representative of a population; instead, it seeks to reduce the risks that experimental group participants are not representative of control group participants.~
~
 The principal technique of randomization simply entails using procedures based on probability theory to assign research participants to experimental and control groups.~
~
 Having recruited, by whatever means, the group of all participants, the researchers might flip a coin to determine to which group each participant is assigned; or researchers may number all of the participants serially and assign them by selecting numbers from a random numbers table (such as the one in Appendix B); or researchers may put the odd-numbered participants in one group and put the even-numbered ones in the other.~
~
 In randomization the research participants are our study population that we randomly divide into two samples.~
~
 As we will see in Chapter 14, on sampling, if the number of research participants involved is large enough it is reasonable to expect that the various characteristics of the participants will be distributed in an approximately even manner between the two groups, thus making the two groups comparable.~
~
  Matching Although randomization is the best way to avoid bias in assigning research participants to groups and increasing the likelihood that the two groups will be comparable, it does not guarantee full comparability.~
~
 One way to further improve the chances of obtaining comparable groups is to combine randomization with matching, in which pairs of participants are matched on the basis of their similarities on one or more variables, and one member of the pair is then randomly assigned to the experimental group and the other to the control group.~
~
 Matching can also be done without randomization, as we will see in Chapter 11 when we discuss quasi-experiments, but we only have a true experiment when matching is combined with randomization.~
~
 Matching without randomization does not control for all possible biases in who gets assigned to which group.~
~
 To illustrate matching with randomization, suppose 12 of your research participants are young white men.~
~
 You might assign 6 of those at random to the experimental group and the other 6 to the control group.~
~
 If 14 research participants are middleag~ d African American women, you might randomly assign 7 to each group.~
~
 The overall matching process could be most efficiently achieved through the creation of a quota matrix constructed of all the most relevant characteristics.~
~
 (Figure 10-3 provides a simplified illustration of such a matrix.~
~
) Ideally, the quota matrix would be constructed to result in an even number of research participants in each cell of the matrix.~
~
 Then, half of the research participants in each cell would randomly go into the experimental group and half into the control group.~
~
 Alternatively, you might recruit more research participants than are required by your experimental design.~
~
 You might then examine many characteristics of the large initial group of research participants.~
~
 Whenever you discover a pair of highly similar participants, you might assign one at random to the experimental group and the other to the control group.~
~
 Potential participants who were unlike anyone else in the initial group might be left out of the experiment altogether.~
~
 Whatever method is used, the desired result is the same.~
~
 The overall average description of the experimental group should be the same as that of the control group.~
~
 For instance, they should have about the same average age, the same gender composition, the same racial composition, and so forth.~
~
 As a general rule, the two groups should be comparable in terms of those variables that are likely to be related to the dependent variable under study.~
~
 In a study of gerontological social work, for example, the two groups should be alike in terms of age, gender, ethnicity, and physical and mental health, among other variables.~
~
 In some cases, moreover, you may delay assigning research participants to experimental and control groups until you have initially measured the dependent variable.~
~
 Thus, for instance, you might administer a questionnaire that measures participants' psychosocial functioning and then match the experimental and control groups to assure yourself that the two groups exhibited the same overall level of functioning before intervention.~
~
  Providing Services to Control Groups You may recall reading in Chapter 4 that the withholding of services from people in need raises ethical concerns.~
~
 It may also be unacceptable to agency administrators, who fear bad publicity or the loss of revenues based on service delivery hours.~
~
 We must therefore point out that when we discuss withholding the intervention being tested from the control group, we do not mean that people in the control group should be denied services.~
~
 We simply mean that they should not receive the experimental intervention that is being tested during the period of the test.~
~
 .~
~
 When experiments are feasible to carry out 10 social work settings, control group participants are likely to receive the usual, routine services provided by an agency.~
~
 Experimental group participants will receive the new, experimental intervention being tested, perhaps in addition to the usual, routine services.~
~
 Thus, the experiment may determine whether services that include the new intervention are more effective than routine services, rather than attempt to ascertain whether the new intervention is better than no service.~
~
 Moreover, control group participants may be put at the top of a waiting list to receive the new intervention once the experiment is over.~
~
 If the results of the experiment show that the tested intervention is effective, or at least is not harmful, it can then be offered to control group participants.~
~
 The researcher may also want to measure whether control group participants change in the desired direction after they receive the intervention.~
~
 The findings of this measurement can buttress the main findings of the experiment.~
~
  ADDITIONAL THREATS TO THE VALIDITY OF EXPERIMENTAL FINDINGS So far, we have seen how the logic of experimental designs can control for most threats to internal validity.~
~
 Additional threats to the validity of the conclusions we draw from experiments require methodological efforts that go beyond their design logic.~
~
 Let's now look at each of these additional threats and the steps that can be taken to alleviate them.~
~
  Measurement Bias No matter how well an experiment controls for other threats to internal validity, the credibility of its conclusions can be damaged severely if its measurement procedures appear to have been biased.~
~
 Suppose, for example, that a clinician develops a new therapy for depression that promises to make her rich and famous and then evaluates her invention in an experiment by using her own subjective clinical judgment to rate improvement among the experimental and control group participants, knowing which group each participant is in.~
~
 Her own ego involvement and vested interest in wanting the experimental group participants to show more improvement would make her study so vulnerable to measurement bias that her "findings" would have virtually no credibility.~
~
 Although this example may seem extreme, serious measurement bias is not as rare in experimental evaluations as you might imagine.~
~
 It is not difficult to find reports of otherwise well-designed experiments in which outcome measures were administered or completed by research assistants who knew the study hypothesis, were aware of the hopes that it would be confirmed, and knew which group each participant was m.~
~
 Whenever measurement of the dependent variable involves using research staff to supply ratings (either through direct observation or interviews), the individuals who supply the ratings should not know the experimental status of the participants they are rating.~
~
 The same principle applies when the people supplying the ratings are practitioners who are not part of the research staff but who still might be biased toward a particular outcome.~
~
 In other words, they should be "blind" as to whether any given rating refers to someone who bas received the experimental stimulus (or service) or someone who has not.~
~
 The term blind ratings {or blind raters) means that the study has controlled for the potential-and perhaps unconscious-bias of raters toward perceiving results that would confirm the hypothesis.~
~
 Likewise, whenever researchers fail to inform you that such ratings were blind, you should be skeptical about the study's validity.~
~
 No matter how elegant the rest of a study's design might be, its conclusions are suspect if results favoring the experimental group were provided by raters who might have been biased.~
~
 The use of blind raters, unfortunately, is often not feasible in social work research studies.~
~
 When we are unable to use them, we should look for alternative ways to avoid rater bias.~
~
 For example, we might use validated self-report scales to measure the dependent variable rather than rely on raters who may be biased.~
~
 But even when such scales are used, those administering them can bias the outcome.~
~
 For example, if participants ask questions about scale items that confuse them, biased testers might consciously or unconsciously respond in ways that predispose participants to answer scale items in ways that are consistent with the desired outcome.~
~
 We have even heard of situations where biased testers-in providing posttest instructions-encouraged experimental group participants to answer "valid" scales in ways that would show how much they improved since the pretest.~
~
 The term research reactivity refers to changes in outcome data that are caused by researchers or research procedures rather than the independent variable.~
~
 Let's now look at the various ways in which research reactivity can threaten the validity of experimental findings.~
~
  Research Reactivity Biasing comments by researchers during data collection is just one of many forms of research reactivity.~
~
 Two related terms that are used to refer to that sort of reactivity are experimental demand characteristics and experimenter expectancies.~
~
 Research participants learn what experimenters want them to say or do, and then they cooperate with those "demands" or expectations.~
~
 Demand characteristics and experimenter expectancies can appear in more subtle ways as well.~
~
 Some therapists who treat traumatized clients, for example, will repeatedly ask the client at different points during therapy sessions to rate on a scale from 0 to 10 how much distress they are feeling during therapy when they call up a mental image of the traumatic event.~
~
 Through the therapist's verbal communication as well as nonverbal communication {smiles or looks of concern, for example), the client can learn that the therapist hopes the rating number will diminish over the course of therapy.~
~
 Some studies evaluating trauma therapy administer the same 0- 10 rating scale at pretest and posttest chat the therapist administers throughout treatment.~
~
 Even if the pretests and posttests are administered by research assistants who are unaware of clients' experimental group status, clients will have learned from the therapist that they are expected to report lower distress scores at posttest than at pretest.~
~
 Worse yet, in some studies it is the therapist herself who administers the same 0-10 scale at posttest that she has been using repeatedly as part of the therapy.~
~
 One way to alleviate the influence of experimenter expectancies and demand characteristics is to separate the measurement procedures from the treatment procedures.~
~
 Another way is to use measurement procedures that are hard for practitioners or researchers to influence.~
~
 Instead of using the above 0-10 scale at pretest and posttest, for example, a research assistant could administer physiological measures of distress {such as pulse rate) while the client thinks of the traumatic event.~
~
 It would also help if the assistants administering pretest and posttest scales were blind as to the study's hypothesis or the experimental status of the participants, to avoid giving cues about expected outcomes (Shadish, Cook, and Campbell, 2001).~
~
 Sometimes we can use raters or scale administrators who are not blind but do not seem likely to be biased.~
~
 We may, for instance, ask teachers to rate the classroom conduct of children who receive two different forms of social work intervention.~
~
 The teachers may know which intervention each student is receiving but not have much technical understanding of the interventions or any reason co favor one intervention over another.~
~
 A related option is to directly observe and quantify the actual behavior of participants in their natural setting rather chan rely on their answers to self-report scales or on someone's racings.~
~
 It matters a great deal, however, whether that observation is conducted in an obtrusive or unobtrusive manner.~
~
 Obtrusive observation occurs when the participant is keenly aware of being observed and thus may be predisposed to behave in ways that meet experimenter expectancies.~
~
 In contrast, unobtrusive observation means that the participant does not notice the observation.~
~
 Suppose an experiment is evaluating the effectiveness of a new form of therapy in reducing the frequency of antisocial behaviors among children in a residential treatment center.~
~
 If the child's therapist or the researcher starts showing up with a pad and pencil to observe the goings-on in the child's classroom or cottage, he or she might stick out like a sore thumb and make the child keenly aware of being observed.~
~
 That form of observation would be obtrusive, and the child might exhibit atypically good behavior during observation.~
~
 A more unobtrusive option would be to have teachers or cottage parents tabulate the number of antisocial behaviors of the child each day.~
~
 Their observation would be less noticeable to the child because they are part of the natural setting and being observed by a teacher or cottage parent is part of the daily routine and not obviously connected to the expectations of a research study.~
~
 Whenever we are conducting experimental research (or any other type of research) and we are unable to use blind raters, blind scale administrators, unobtrusive observation, or some ocher measurement alternative that we think is relatively free of bias, we should cry to use more than one measurement alternative, relying on the principle of triangulation, as discussed in Chapter 8.~
~
 If two or more measurement strategies, each vulnerable to different biases, produce the same results, then we can have more confidence in the validity of those results.~
~
 Another form of research reactivity can occur when the research procedures don't just influence participants to tell us what they think we want to hear in response to our measures, but when the measures themselves produce desired changes.~
~
 For instance, suppose as part of the research data-collection procedures to measure outcome, participants in a parent education intervention self-monitor how much rime they spend playing with or holding a friendly conversation with their children.~
~
 That means that they will keep a running log, recording the duration of every instance that they play with or hold a conversation with their child.~
~
 Keeping such a log might make some parents realize that they are spending much less quality time with their children than they had previously thought.~
~
 This realization might influence them to spend more quality time with their children; in fact, it might influence them to do so more than the parent education intervention did.~
~
 It is conceivable that desired changes might occur among experimental group participants simply because they sense they are getting special attention or special treatment.~
~
 To illustrate this form of reactivity, suppose a residential treatment center for children conducts an experiment to see if a new recreational program will reduce the frequency of antisocial behaviors among the children.~
~
 Being assigned to the experimental group might make some children feel better about themselves and about the center.~
~
 If this feeling-and not the recreational program per se-causes the desired change in their behavior, then a form of research reactivity will have occurred.~
~
 This form of reactivity has been termed novelty and disruption effects because introducing an innovation in a setting where little innovation has previously occurred can stimulate excitement, energy, and enthusiasm among recipients of the intervention (Shadish, Cook, and Campbell, 2001).~
~
 A similar form of reactivity is termed placebo effects.~
~
 Placebo effects can be induced by experimenter expectancies.~
~
 If experimental group participants get the sense that they are about to receive a special new treatment that researchers or practitioners expect to be very effective, then the mere power of suggestion-and not the treatment itself- can bring about the desired improvement.~
~
 If we are concerned about potential placebo effects or novelty and disruption effects and wish to control for them, then we could employ・ an experimental design called the placebo control group design.~
~
 The shorthand notation for this design is This design randomly assigns clients to three groups: an experimental group and two different control groups.~
~
 One control group receives no experimental stimulus, bur the other receives a placebo (represented by the P in the preceding notation).~
~
 Placebo group subjects would receive special attention of some sort other than the reseed stimulus or intervention.~
~
 Perhaps practitioners would meet regularly to show special interest in them and listen to them but without applying any of the rested intervention procedures.~
~
 Placebo control group designs pose complexities from both a planning and interpretation standpoint, particularly when experimental interventions contain elements that resemble placebo effects.~
~
 For example, in some interventions that emphasize constructs such as "empathy" and "unconditional positive regard," intervention effects are difficult to sort from placebo effects.~
~
 But when they are feasible to use, placebo control group designs provide greater control for threats to the validity of experimental findings than do designs that use only one control group.~
~
 Ethical concern about deceiving research participants makes the use of placebos rare in social work experiments.~
~
 However, sometimes a second control group can receive treatment as usual as a way to compare the outcome of a group receiving a new intervention to the outcome of a group that receives the somewhat special attention of the practitioners providing the less innovative intervention.~
~
 A study published in a social work research journal that took such an approach is summarized in the box "A Social Work Dissertation that Evaluated the Effectiveness ofEMDR.~
~
" We do nor want to convey the impression that an experiment's findings lack credibility unless the experiment can guarantee the complete absence of any possible research reactivity or measurement bias.~
~
 It is virtually impossible for experiments in social work or allied fields to meet that unrealistic standard.~
~
 Instead, the key issue should be whether reasonable efforts were taken to avoid or minimize those problems and whether or not the potential degree of bias or reactivity seems to be at an egregious level.~
~
 That said, let's move on to a different type of threat to the validity of experimental findings.~
~
  Diffusion or Imitation of Treatments Sometimes, service providers or service recipients are influenced unexpectedly in ways that tend to diminish the planned differences in the way a rested intervention is implemented among the groups being compared.~
~
 This phenomenon is termed diffusion, or imitation, of treatments.~
~
 For instance, suppose the effects of hospice services that emphasize palliative care and psychosocial support for the terminally ill are being compared to the effects of more traditional health care providers, who historically have been more attuned to prolonging life and thus less concerned with the unpleasant physical and emotional side effects of certain treatments.~
~
 Over time, traditional health care providers have been learning more about hospice care concepts, accepting them, and attempting to implement them in traditional health care facilities.~
~
 With all of this diffusion and imitation of hospice care by traditional health care providers, failure to find differences in outcome between hospice and traditional care providers may have more to do with unanticipated similarities between hospice and traditional providers than with the ineffectiveness of hospice concepts.~
~
 A similar problem complicates research that evaluates the effectiveness of case management services.~
~
 Many social workers who are not called case managers nevertheless conceptualize and routinely provide case management functions-such as outreach, brokerage, linkage, and advocacy-as an integral part of what they learned to be good and comprehensive direct social work practice.~
~
 Consequently, when outcomes for clients referred to case managers are compared to the outcomes of clients who receive "traditional" social services, the true effects of case management as a treatment approach may be blurred by the diffusion of that approach among practitioners who are not called case managers.~
~
 In other words, despite their different labels, the two treatment groups may not be as different in the independent variable as we think they are.~
~
 Preventing the diffusion or imitation of treatments can be difficult.~
~
 Shadish, Cook, and Campbell (2001) suggest separating the two treatment conditions as much as possible, either geographically or by using different practitioners in each.~
~
 Another possibility is to provide ongoing reminders to practitioners about the need not to imitate the experimental group intervention when seeing control group clients.~
~
 To monitor the extent to which the imitation of treatment is occurring or has occurred, researchers can utilize qualitative methods (see Chapters 17 and 18) to observe staff meetings, conduct informal conversational interviews with practitioners and clients, and ask them to keep logs summarizing what happened in each treatment session.~
~
 If these efforts detect imitation while the experiment is still under way, then further communication with practitioners may help alleviate the problem and prevent it from reaching a level that seriously undermines the validity of the experiment.~
~
  Compensatory Equalization, Compensatory Rivalry, or Resentful Demoralization Suppose you conduct an experiment to see if increasing the involvement of families in the treatment of substance abusers improves treatment effectiveness.~
~
 Suppose the therapists in one unit receive special training in working with families and are instructed to increase the treatment involvement of families of clients in their unit, while the therapists in another unit receive no such training or instructions.~
~
 Assuming that the staff in the latter unit-and perhaps even their clients and the families of their clientsare aware of the treatment differences, they may seek to offset what they perceive as an inequity in service provision.~
~
 The staff in the latter unit therefore might decide to compensate for the inequity by providing enhanced services that go beyond the routine treatment regimen for their clients.~
~
 This is termed compensatory equalization.~
~
 If compensatory equalization happens, the true effects of increasing family involvement could be blurred~ as described above with diffusion or imitation of treatments.~
~
 What if the therapists not receiving family therapy training in the above example decide to compete with the therapists in the other unit who do receive the training? Perhaps they will feel their job security or prestige is threatened by not receiving the special training and will try tO show that they can be just as effective without the special training.~
~
 They may start reading more, attending more continuing education workshops, and increasing their therapeutic contact with clients.~
~
 This is called compensatory rivalry.~
~
 The control group therapists' extra efforts might increase their effectiveness as much as the increased family involvement might have increased the effectiveness of the experimental group therapists.~
~
 If so, this could lead to the erroneous impression that the lack of difference in treatment outcome between the two groups means that increasing family involvement did not improve treatment effectiveness.~
~
 The same problem could occur if the clients in one group become more motivated to improve because of the rivalry engendered by their awareness that they are not receiving the same treatment benefits as another group.~
~
 The converse of compensatory rivalry is resentful demoralization.~
~
 This occurs when staff or clients become resentful and demoralized because they did not receive the special training or the special treatment.~
~
 Consequently, their confidence or motivation may decline and may explain their inferior performance on outcome measures.~
~
 To detect whether compensatory equalization, compensatory rivalry, or resentful demoralization is occurring-and perhaps intervene to try to minimize the problem-you can use qualitative methods such as participant observation of staff meetings and informal conversational interviews with clients and practitioners.~
~
  Attrition (Experimental Mortality) Let's now look at one more threat to the validity of experimental findings: attrition, which is sometimes referred to as experimental mortality.~
~
 Often participants will drop out of an experiment before it is completed, and the statistical comparisons and conclusions that are drawn can be affected.~
~
 In a pretest-posttest control group design evaluating the effectiveness of an intervention to alleviate a distressing problem, for example, suppose that experimental group participants who perceive no improvement in their target problem prematurely drop out of treatment and refuse to be posttested.~
~
 At posttest, the only experimental group participants left would be those who felt they were improving.~
~
 Suppose the overall group rate of perceived improvement among control group participants is exactly the same as the overall rate among those assigned to the experimental group (including the dropouts), but all of the nonrecipients agree to be posttested because none had been disappointed.~
~
 The experimental group's average posttest score is likely to be higher than the control group's-even if the intervention was ineffective-merely because of the attrition (experimental mortality) of experimental group participants who perceived no improvement.~
~
 As another example, consider an evaluation that compares the effectiveness of family therapy and discussion groups in the treatment of drug addiction.~
~
 Shadish, Cook, and Campbell (2001) point out that addicts with the worst prognoses are more likely to drop out of discussion groups than they are to drop out of family therapy.~
~
 Consequently, the family therapy intervention may have poorer results at posttest, not because it is less effective than discussion groups, but because the different attrition rates left more difficult cases in the family therapy group at posttest.~
~
 Even if the dropouts agreed to be posttested, that would not resolve the attrition problem.~
~
 The dilemma in this instance would be that the experimental group's overall average outcome score will not be an accurate depiction of intervention effects because it will have been influenced by the scores of people who did not complete the family therapy intervention.~
~
 Consequently, the family therapy intervention still may have poorer results at posttest, not because it is less effective than discussion groups, but because it had more cases that failed to complete the course of therapy.~
~
 Researchers conducting experimental evaluations of the effectiveness of practice or programs should strive to minimize attrition.~
~
 Here are ways to do that: 1.~
~
 Reimbursement.~
~
 Reimbursing participants for their participation in research might not only alleviate attrition but also enhance your ability to recruit people to participate in your study at the outset.~
~
 The level of reimbursement should be sensitive to the time and efforts of participants in pretesting and posttesting.~
~
 The payment should be large enough to work as an incentive without being so great that it becomes coercive.~
~
 The amount should fit .~
~
the difficulties that clients experience in participating as well as fit their income levels and emotional states.~
~
 With low-income participants, for example, you should anticipate difficulties in child care and transportation to and from pretesting and posttesting (and perhaps follow-up testing).~
~
 If feasible, an alternative to extra payments for transportation and child-care costs might be to provide the transportation to the testing site, as well as a small child-care service there.~
~
 Alternatively, it might make sense to conduct the testing at the participant's residence if doing so does not introduce serious measurement biases.~
~
 After pretesting, you might want to increase the amount of reimbursement over time at each subsequent measurement point and give a bonus to participants who stay the distance and complete all of the measurements throughout the study.~
~
 The amount should go beyond the transportation and other costs to the participant and should be enough to acknowledge to participants that their time is valued and that the measurement can be an imposition.~
~
 The amount should also fit within your research budget.~
~
 If you can afford it, paying participants $15 to complete pretests, $20 to complete posnests, and perhaps another $15 bonus to stay the distance might be reasonable amounts, but they might have to be adjusted upward depending on factors such as child-care costs and inflation.~
~
 (Discount department-store gift certificates in the above amounts are commonly used instead of cash payments.~
~
) 2.~
~
 Avoid intervention or research procedures that disappoint or frustrate participants.~
~
 Participants are more likely to drop out of an experiment if they are disappointed or frustrated with the intervention they are receiving as part of the study.~
~
 Of course, there's not much you can do to prevent disappointment over the fact that the intervention simply is not effective.~
~
 But you can try to have the intervention delivered by the most experienced, competent professionals possible.~
~
 Of particular importance is their experience and ability in developing supportive professional relationships with clients.~
~
 In contrast, if your intervention is delivered by inexperienced practitioners who are not yet comfortable or confident in building and maintaining treatment alliances with clients, then participants receiving the intervention are more likely to become put off and drop out.~
~
 Another way to prevent disappointment and frustration with the intervention is to make sure during the recruitment and orientation of participants that they have accurate expectations of the intervention and that the intervention is a good fit with their treatment objectives and expectations.~
~
 It also helps if the intervention itself does not contain noxious procedures, such as having participants recall repressed traumatic memories in ways that are like reexperiencing the trauma and then ending sessions without resolving the intense distress that the recalled memory has stimulated.~
~
 Finally, minimizing the amount of time that elapses between recruitment of participants and the onset of treatment can help avoid attrition because participants assigned to a particular treatment group may become disappointed and frustrated if the wait is much longer than they had expected.~
~
 Annoying research procedures can also influence participants to drop our.~
~
 Common examples are overwhelming participants with measurement procedures chat exceed their expectations, stamina, or resources.~
~
 Researchers should not mislead prospective participants by underestimating the extent of the measurement procedures to which they will be subjected.~
~
 Neither should researchers mislead participants about issues such as child-care requirements or resources, scheduling difficulties, providing feedback about measurement scores, or protecting confidentiality.~
~
 Shadish, Cook, and Campbell (2001) encourage researchers to conduct preliminary pilot studies to identify causes of attrition that can be anticipated and alleviated when the main experiment begins.~
~
 For example, research assistants who are not involved in other parts of the pilot study could interview dropouts to ascertain how they experienced the study and why they dropped out.~
~
 3.~
~
 Utilize tracking methods.~
~
 Many recipients of social work interventions are transient or secretive about where they live.~
~
 Many are unemployed.~
~
 Some lack telephones.~
~
 The poor, the homeless, substance abusers, and battered women are prominent examples.~
~
 Shadish, Cook, and Campbell (2001) review the tracking strategies that have been used to find such participants and retain their participation in treatment and measurement.~
~
 One of their recommendations is to obtain as much location information as possible at the outset of their participation, not only from the participants themselves, but also from their friends, relatives, and other agencies with which they are involved.~
~
 (You will need to get the participant's signed permission to contact these sources.~
~
) Another recommendation is to develop relationships with staff members at agencies who may later be able to help you find participants.~
~
 You can also give participants a business card that shows treatment and measurement appointment times and a roll-free number where they can leave messages about appointment changes or changes in how to locate them.~
~
 If participants have telephones, then research assistants can call them to remind them of each appointment.~
~
 If they have mailing addresses, you can augment your telephone tracking by mailing them reminder notices of upcoming appointments.~
~
 (You may recall that we discussed these tracking methods in more depth in Chapter 5, on culturally competent research.~
~
)  EXTERNAL VALIDITY When a study has a high degree of internal validity, it allows causal inferences to be made about the sample and setting chat were studied.~
~
 But what about ocher settings and larger populations? Can we generalize the same causal inferences to them? As we mentioned earlier, external validity refers to the extent to which we can generalize the findings of a study to settings and populations beyond the study conditions.~
~
 Internal validity is a necessary but not sufficient condition for external validity.~
~
 Before we can generalize a causal inference beyond the conditions of a particular study, we must have adequate grounds for making the causal inference under the conditions of that study in the first place.~
~
 But even when internal validity in a particular study is high, several problems may limit its external validity.~
~
 A major factor that influences external validity is the representativeness of the study sample, setting, and procedures.~
~
 Suppose a deinstitutionalization program is implemented in an urban community.~
~
 Suppose that community residents strongly support the program, that it's well funded, and that there is a comprehensive range of noninstitutional community support resources accessible to the mentally disabled clients residing in the community.~
~
 Suppose, in turn, that the well-funded program can afford to hire high-caliber staff members, give them small caseloads, and reward them amply for good work.~
~
 Finally, suppose that an evaluation with high internal validity finds that the program improves the clients' quality of life.~
~
 Would those findings imply that legislators or mental health planners in other localities could logically conclude that a similar deinstitutionalization program would improve the quality of life of mentally disabled individuals in their settings? Not necessarily.~
~
 It would depend on the degree to which their settings, populations, and procedures matched those of the studied program.~
~
 Suppose their community is rural, has fewer or more geographically dispersed community-based resources for the mentally disabled, or has more neighborhood opposition to residences being located in the community.~
~
 Suppose legislators view deinstitutionalization primarily as a cost-saving device and therefore do not allocate enough funds to enable the program to hire or keep high-caliber staff members or give them caseload sizes that are small enough to manage adequately.~
~
 And what about differences in the characteristics of the mentally disabled target population? Notice that we have said nothing about the attributes of the clients in the tested program.~
~
 Per・ haps they were different in age, diagnosis, ethnicity, average length of previous institutionalization, and degree of social impairment than the intended target population in the communities generalizing from the study findings.~
~
 To the extent that such differences apply, similar programs implemented in other settings might not have the same effects as did the program in the tested setting.~
~
 Would such differences mean that this study had low external validity? Not necessarily.~
~
 On the one hand, we could say that a study has low external validity if its conditions are far removed from conditions that could reasonably be expected to be replicated in the real world.~
~
 On the other hand, a study's external validity could be adequate even if it cannot be generalized to many other settings.~
~
 A study must be generalizable to some real-world settings, and it must represent that which it intends to represent.~
~
 It does not have to represent every conceivable population or setting.~
~
 For example, a study that evaluates a program of care for the profoundly and chronically disabled in rural settings does not need to be generalizable to the mildly or acutely disabled or to the disabled residing in urban settings in order to have external validity.~
~
 It just has to be representative of those attributes that it intends to represent, no matter how narrowly it defines them.~
~
 Problems in external validity abound in the literature that evaluates social work practice and programs.~
~
 One common problem that limits external validity is ambiguity or brevity in reportage.~
~
 Many studies do not adequately articulate the specific attributes of the clients who participated in the evaluated service.~
~
 b h ? ? 1 ?I Many are vague a out t e pract1t1oners attnnutes.~
~
 Some studies generalize about the effectiveness of professional social work practitioners based on findings about the effectiveness of student practitioners.~
~
 Some studies leave out important details about the evaluated clinical setting, such as caseload size and the like.~
~
 Consequently, although it may be clear that the evaluated intervention did or did not cause the desired change among the studied clients-that is, that the study had high internal validity-it is often not clear to whom those findings can be generalized.~
~
 Thus, some studies find services to be effective bur do not permit the generalization that those services would be effective beyond the study conditions.~
~
 Likewise, other studies find no support for the effectiveness of services, but do not permit the generalization that those services would be ineffective when implemented under other conditions.~
~
 As we leave this chapter, we hope you can begin to sense how difficult it may be to carry out successfully a well-controlled experiment in a real social work setting.~
~
 Many practical obstacles, some of which are impossible to foresee, can interfere with our bestlaid plans.~
~
 We have addressed some obstacles in this chapter, such as participant attrition or practitioner imitation of treatments.~
~
 We will address additional pitfalls in the next chapter, such as improper implementation of the intervention being evaluated, difficulties in recruiting participants, and practitioner resistance to research procedures for assigning cases to experimental and control conditions.~
~
 In particular, the next chapter will focus on how quasi-experimental designs attempt to achieve a reasonable degree of internal validity when agency obstacles make it impossible to use randomization procedures in assigning participants to experimental and control groups.~
~
  Main Points ? An inference is a conclusion that can be logically drawn in light of our research design and our findings.~
~
 ? A causal inference is one derived from a research design and findings that logically imply that the independent variable really has a causal impact on the dependent variable.~
~
 ? The term research design can refer to all the decisions made in planning and conducting research, including decisions about measurement, sampling, how to collect data, and logical arrangements designed to permit certain kinds of inferences.~
~
 ? There are three basic criteria for the determination of causation in scientific research: (1) The independent (cause) and dependent (effect) variables must be empirically related to each other, (2) the independent variable must occur earlier in time than the dependent variable, and (3) the observed relationship between these two variables cannot be explained away as being due to the influence of some third variable that causes both of them.~
~
 ? Internal validity refers to the confidence we have that the results of a study accurately depict whether one variable is or is not a cause of another.~
~
 ? Common threats to internal validity are history, maturation, testing, instrumentation changes, statistical regression, selection bias, and causal time order.~
~
 ? Three forms of pre-experiments are the one-shot case study, the one-group pretest-posttest design, and the posttest-ooly design with nonequivalent groups.~
~
 ? Experiments are an excellent vehicle for the cootrolled testing of causal processes.~
~
 ? The classical experiment tests the effect of an experimental stimulus on some dependent variable through the pretesting and posttesting of experimental and control groups.~
~
 ? The Solomon four-group design and the posttestonly control group design are variations on the classical experiment that attempt to safeguard against problems associated with testing effects.~
~
 ? Randomization is the generally preferred method for achieving comparability in the experimental and control groups.~
~
 ? It is generally less important that a group of experimental subjects be representative of some larger population than that experimental and control groups be similar to one another.~
~
 ? Control group participants in experiments in social work settings need not be denied services.~
~
 They can receive alternate, routine services, or be put on a waiting list to receive the experimental intervention.~
~
 ? Although the classical experiment with random assignment of subjects guards against most threats to internal validity, additional methodological efforts may be needed to prevent or alleviate the following problems: (a) measurement bias, (b) research reactivity, (c) diffusion or imitation of treatments, (d) compensatory equalization, (e) compensatory rivalry, (f) resentful demoralization, and (g) attrition.~
~
 ? Techniques for minimizing attrition include reimbursing participants for their participation, avoiding intervention or research procedures that disappoint or frustrate them, and tracking participants.~
~
 ? Many experimental studies fail to include measurement procedures, such as blind raters, to control for researcher or practitioner bias toward perceiving results that would confirm the hypothesis.~
~
 ? Experimental demand characteristics and experimenter expectancies can hinder the validity of experimental findings if they influence research participants to cooperate with what experimenters want them to say or do.~
~
 ? Obtrusive observation occurs when the participant is keenly aware of being observed and thus may be predisposed to behave in ways that meet experimenter expectancies.~
~
 In contrast, unobtrusive observation means that the participant does not notice the observation.~
~
 ? External validity refers to the extent to which we can generalize the findings of a study to settings and populations beyond the study conditions.~
~
  Review Questions and Exercises 1.~
~
 Pick three of the threats to internal validity discussed in this chapter and make up examples (other than those discussed in the chapter) to illustrate each.~
~
 2.~
~
 A director of a prison Bible studies program claims that his program prevents recidivism.~
~
 His claim is based on data showing that only 14 percent of the inmates who complete his Bible studies program recidivate, as compared to 41 percent of the inmates who choose not ro participate in his program.~
~
 Based on what you read in this chapter, explain why his claim is not warranted.~
~
 3.~
~
 What potential threats to the validity of the findings can you detect in the following hypothetical design? In a residential treatment center containing four cottages, the clinical director develops a new intervention to alleviate behavior problems among the children residing in the four cottages.~
~
 The center has four therapists, each assigned to a separate cottage.~
~
 The clinical director selects two cottages to receive the new intervention.~
~
 The other two will receive the routine treatment.~
~
 To measure outcome, the clinical director assigns a social work student whose field placement is at the center to spend an equal amount of time at each cottage observing and recording the number of antisocial behaviors each child exhibits and the number of antisocial statements each makes.~
~
 4.~
~
 Briefly sketch an experimental design for testing a new intervention in your fieldwork agency or in another social work agency with which you are familiar.~
~
 Then conduct a qualitative (open-ended, semistructured) interview with one or two direct-service practitioners and an administrator in that agency, asking them how feasible it would be to carry our your study in their agency.~
~
 5.~
~
 A newspaper article (Perlman, 1982) discussed arguments that linked fluoridation to acquired immune deficiency syndrome (AIDS), citing this evidence: "While half the country's communities have fluoridated water supplies, and half do not, 90% of AIDS cases are coming from fluoridated areas and only 10% are coming from nonfluoridated areas.~
~
" Discuss this in terms of what you have learned about the criteria of causation, indicating what other variables might be involved.~
~
 6.~
~
 A study with an exceptionally high degree of internal validity conducted with Native Alaskan female adolescents who have recendy been sexually abused concludes that an intervention is effective in preventing substance abuse among its participants.~
~
 Explain how this study can have little external validity from one perspective, yet a good deal of external validity from another perspective-depending upon the target population of the practitioners who are utilizing the study as a potential guide to their evidence-based practice.~
~
  Internet Exercises 1.~
~
 Find a study reporting an experiment that evaluated the effectiveness of a social work intervention.~
~
 How well did the study control for the additional threats to validity discussed in this chapter? What efforts did it make to alleviate attrition? Were its measurement procedures obtrusive or unobtrusive? Do they appear to be free from serious bias? Also critique the study's external validity-either positively or negatively.~
~
 2.~
~
 Find a study that used a pre-experimental design to evaluate the outcome of a social work intervention.~
~
 Critique the study's internal validity and discuss whether it had value despite its pre-experimental nature.~
~
 3.~
~
 In this chapter, we looked briefly at the problem of placebo effects.~
~
 On the web, find a srudy in which the placebo effect figured importantly.~
~
 Briefly summarize the study, including the source of your information.~
~
 (Hint: You might want to do a search using the term placebo as a key word.~
~
)  Additional Readings Campbell, Donald, and Julian Stanley.~
~
 1963.~
~
 Experimental and Quasi-Experimental Designs for Research.~
~
 Chicago: Rand McNally.~
~
 An excellent analysis of the logic and methods of experimentation in social research, this book is especially useful in its application of the logic of experiments to other social research methods.~
~
 Though fairly old, this book has attained the status of a classic and is still cited frequently.~
~
 Cook, Thomas D.~
~
, and Donald T.~
~
 Campbell.~
~
 1979.~
~
 Quasi-Experimentation: Design and Analysis Issues for Field Settings.~
~
 Chicago: Rand McNally, 1979.~
~
 This work is an expanded and updated version of Campbell and Stanley.~
~
 Shadish, William R.~
~
, Thomas D.~
~
 Cook, and Donald T.~
~
 Campbell.~
~
 2001.~
~
 Experimental and QuasiExperimental Designs for Generalized Causal Inference.~
~
 New York: Houghton Mifflin.~
~
 This excellent book is a successor to the books mentioned above by Campbell and Stanley and by Cook and Campbell.~
~
 One primary difference from the earlier books is its increased attention to external validity, epistemology (refuting philosophical attacks on the possibility of objectivity), and designs without random assignment.~
~