Sample size in qualitative research Margarete Sandelowski

Research in Nursing & Health, 1995, 18, 179-1 83

Focus on Qualitative Methods

Sample Size in Qualitative Research

Margarete Sandelowski

A common misconception about sampling in qualitative research is that numbers are unimpor- tant in ensuring the adequacy of a sampling strategy. Yet, simple sizes may be too small to support claims of having achieved either informational redundancy or theoretical saturation, or too large to permit the deep, case-oriented analysis that is the raison-d’etre of qualitative inquiry. Determining adequate sample size in qualitative research is ultimately a matter of judgment and experience in evaluating the quality of the information collected against the uses to which it will be put, the particular research method and purposeful sampling strategy employed, and the research product intended. 0 1995 John Wiley & Sons. Inc.

A common misconception about sampling in qualitative research is that numbers are unimpor- tant in ensuring the adequacy of a sampling strategy. The “logic and power” (Patton, 1990, p. 169) of the various kinds of purposeful sampling used in qualitative research lie primarily in the quality of information obtained per sampling unit, as opposed to their number per se. More- over, an aesthetic thrust of sampling in qualitative research is that small is beautiful. Yet, inad- equate sample sizes can undermine the credibility of research findings. There are no computations or power analyses that can be done in qualitative research to determine a priori the minimum number and kinds of sampling units required, but there are factors, including the aim of sampling and the type of purposeful sampling and research method employed, which researchers can consider to help them decide whether they have col-

1 1 am indebted to one of the anonymous reviewers of this article for the phrasing “small is beautiful.”

lected enough data. These factors are the subject of this article.

NEITHER SMALL NOR LARGE, BUT TOO SMALL OR TOO LARGE

Adequacy of sample size in qualitative research is relative, a matter of judging a sample neither small nor large per se, but rather too small or too large for the intended purposes of sampling and for the intended qualitative product. A sample size of 10 may be judged adequate for certain kinds of homogeneous or critical case sampling, too small to achieve maximum variation of a complex phenomenon or to develop theory, or too large for certain kinds of narrative analyses.

Reported sample sizes are often too small to support claims of having achieved either informational redundancy (Lincoln & Guba, 1985) or theoretical saturation (Strauss & Corbin, 1990).

Margarete Sandelowski, PhD, RN, is a professor, Department of Women’s and Children’s

This article is part of the ongoing series, Focus on Qualitative Methods, edited or contributed

This article was received on September 7, 1994, revised, and accepted for publication Novem-

Requests for reprints should be addressed to Dr. Sandelowski, University of North Carolina at

Health, School of Nursing, University of North Carolina at Chapel Hill.

by Dr. Sandelowski.

ber 28, 1994.

Chapel Hill, #7460 Carrington Hall, Chapel Hill, NC 27599-7460.

0 1995 John Wiley & Sons, Inc. CCC 0160-6891/95/020179-05 179

180 RESEARCH IN NURSING 8 HEALTH

Impatience, an a priori commitment to what will be seen, or a disinclination to see any more may incline researchers to stop sampling prematurely. Seeing nothing new in newly sampled units or feeling comfortable that a theoretical category has been saturated are functions involving the recognition of what is there and what can be made out of the data already collected, and then deciding whether it is sufficient to create an intended product. These functions are acquired through experience. For example, I have noticed in my own development and that of students with whom I have worked that beginning qualitative researchers often require more sampling units than more experienced researchers to “see” and to “make.” One expert qualitative researcher (P. Stern, personal communication, 1989) intimated that we often have all the data we will need in the very first pieces of data we collect, but that we do not (or cannot) know that until we collect more. Ultimately, information can be deemed redundant or theoretical lines deemed saturated-only for now (Morse, 1989).

Conversely, sample sizes may be too large to support claims to having completed detailed analyses of data, especially the microanalysis de- manded by certain kinds of narrative and obser- vational studies. Even in qualitative projects aimed at explicating regularities across pieces of data, a high premium is still placed on discerning the particularities or idiosyncrasies presented by each piece of data. While qualitative studies may involve what are considered large sample sizes (over 50), qualitative analysis is generically about maximizing understanding of the one in all of its diversity; it is case-oriented, not variable- oriented (Ragin & Becker, 1989). Any sample size interfering with the case-oriented thrust of qualitative work can, accordingly, be judged too large.

ISSUES IN PURPOSEFUL SAMPLING

One of the major differences between qualitative and quantitative research approaches is that qualitative approaches typically involve purposeful sampling, while quantitative approaches usually involve probability sampling (Kuzel, 1992; Mor- se, 1986, 1989; Patton, 1990). Patton (1990) described 14 different types of purposeful sampling, involving the selection for in-depth study of typical, atypical, or, in some way, exemplary “information-rich cases” (p. 169). Researchers in both domains of inquiry often have to resort to sampling they know is less than ideal for their

purposes, but qualitative researchers value the deep understanding permitted by information- rich cases and quantitative researchers value the generalizations to larger populations permitted by random and statistically representative samples. Although a sample of one will never be sufficient to permit generalization of findings to populations, it may be sufficient to permit the valuable kind of generalizations that can be made from and about cases, variously referred to as id- iographic, holographic, naturalistic, or analytic generalizations (Firestone, 1993; Lincoln & Guba, 1985; Ragin & Becker, 1992; Simons, 1980; Stake & Trumbull, 1982).

In qualitative research, events, incidents, and experiences, not people per se, are typically the objects of purposeful sampling (Miles & Huber- man, 1994; Strauss & Corbin, 1990). People, in addition to sites, artifacts, documents, and even data that have already been collected are sampled for the information they are likely to yield about a particular phenomenon. Sample size in qualitative research may refer to numbers of persons, but also to numbers of interviews and observa- tions conducted or numbers of events sampled. People are certainly central in all kinds of inquiry approaches in the health sciences, but they enter qualitative studies primarily by virtue of having direct and personal knowledge of some event (e.g., illness, pregnancy, life transition) that they are able and willing to communicate to others and only secondarily by virtue of demographic characteristics (e.g., age, race, sex).

People Versus Purpose

When qualitative researchers decide to seek people out because of their age or sex or race, it is because they consider them good sources of information that will advance them toward an analytic goal and not because they wish to general- ize to other persons of similar age, sex, or race. That is, a demographic variable, such as sex, becomes an analytic variable; persons of one or the other sex are selected for a study because, by virtue of their sex, they can provide certain kinds of information. Accordingly, only as many persons of a particular sex are included in a study as is necessary to obtain that information. There is no mandate to have equivalent numbers of women or men or numbers of persons of each sex in the proportions in which they appear in a certain population.

Sampling on the basis of demographic characteristics presents something of a problem in achieving both informational and size adequacy

SAMPLE SIZE / SANDELOWSKI 181

in qualitative studies. There is currently a strong impulse (and federal mandate) to eliminate gender, race/ethnicity, and class bias in research by including members of minority or traditionally disempowered groups typically underrepresented in research, and by including women and men typically underrepresented in certain domains of research, such as men in family studies and women in studies of heart disease. Trost (1986) described a “statistically nonrepresentative stratified” sampling strategy whereby researchers can select persons varying in demographic characteristics to achieve representative coverage and inclusion. That is, while the sample is statistically nonrepresentative, it is informationally representative in that data will be obtained from persons who can stand for other persons with similar characteristics. In her illustration involving a study of families with teenagers, five sets of naturally and artificially dichotomized variables (one or two-parent family, one or two or more children, housed in an apartment or home, with a high or low income, and with a male or female teenager) were combined to yield 32 kinds of families to be sampled. A similar kind of sampling plan can be used to ensure inclusion of females and males, and persons varying in social class, race, cultural affiliation, religion, or other dimension.

Although this kind of sampling accommodates a new, laudable, and necessary moral conscious- ness concerning underrepresented and, therefore, often misrepresented groups by partially accom- modating the logic of probability sampling, it may wholly contravene the logic of purposeful sampling. Strictly speaking, sampling for variation in race, class, gender, or other such back- ground or person-related characteristics ought to be done in qualitative studies when they are deemed analytically important and where the fail- ure to sample for such variation would impede understanding or invalidate findings (Cannon, Higginbotham, Leung, 1988). Deciding a priori that a sample will include a certain number or percentage of individuals in various demographic groups may meet federal and other mandates for inclusion of traditionally excluded persons, but it may also result in a sample with a kind of variation that has little analytic significance or detracts from analysis goals (Morse, 1989). More impor- tantly, such a sample may be too small adequately to address the analytic importance of such factors as gender or race, or, alternatively, too large to favor the deep analysis that qualitative projects mandate.

One way to resolve this dilemma is to design

studies in which a phenomenon is investigated in one group at a time (either simultaneously or se- quentially). The design for such studies will include more than one purposeful sampling strategy: for example, homogeneous and maximum variation sampling, where person-related homo- geneity is maintained while variation in the target phenomenon is sought. After a series of such studies has been completed, a larger synthesis of findings can be undertaken in which the researcher can more adequately address the ques- tion of whether and how a variable such as gender is important in understanding a phenomenon.

SAMPLE SIZE IN DIFFERENT KINDS OF PURPOSEFUL SAMPLING

Different kinds of purposeful sampling require different minimum sample sizes. For example, in deviant case sampling, where the intention is to understand a very unusual or atypical manifesta- tion of some phenomenon, one case may be sufficient. Yet, even a sample of one requires within- case sampling (Miles & Huberman, 1994). The researcher must decide which of the varieties of data concerning the case to sample to explicate its atypicality. This is especially evident in cases involving aggregates of one, such as a family, com- munity, or organization. Even when an individual is the focal one, the researcher must sample from the wealth of data obtainable from and about that individual. In short, any one case offers a variety of data that must be sampled in sufficient quantity to make the case.

Maximum variation is one of the most fre- quently employed kinds of purposeful sampling in qualitative nursing research and typically requires the largest minimum sample size of any of the purposeful sampling strategies. As in any kind of sampling, the more variability there is within the confines of a qualitative project, the more numbers of sampling units the researcher will require to reach informational redundancy or theoretical saturation. Researchers wanting maximum variation in their sample must decide what kind(s) of variation they want to maximize and when to maximize each kind. One kind of variation already described is demographic variation, where variation is sought on generally people- related characteristics.

A second kind of variation is phenomenal variation, or variation on the target phenomenon un- der study. For example, the target phenomenon in a study of couples who have obtained positive fetal diagnoses is diagnosis, which varies on such

182 RESEARCH IN NURSING B HEALTH

dimensions as type and time of diagnosis, and the instrumentation used to make it. Like the decision to seek demographic variation, the decision to seek phenomenal variation is often made a priori in order to have representative coverage of variables likely to be important in understanding how diverse factors configure a whole. This kind of sampling is also referred to as selective or criterion sampling, where sampling decisions are made going into a study on “reasonable” grounds, rather than on analytic grounds after some data have already been collected (Glaser, 1978, p. 37; Schatzman & Strauss, 1973).

A third kind of variation is theoretical variation, or variation on a theoretical construct that is asso- ciated with theoretical sampling, or the sampling on analytic grounds characteristic of grounded theory studies. A theoretical sampling strategy is employed to fully elaborate and validate theoreti- cally derived variations discerned in the data. Ini- tial sampling for phenomenal variation permits these theoretical variations to be identified. A program of research employing grounded theory typically begins with a selective or criterion sampling strategy aimed at phenomenal variation and then proceeds to theoretical sampling (San- delowski, Holditch-Davis, & Hams, 1992).

Researchers control the number of sampling units required to achieve informational redundancy or theoretical saturation by deciding which category of variation to maximize and minimize. This decision is a matter of fitting the sampling strategy to the purpose of and method chosen for a particular study and appraising the resources (including number of investigators and financial support) available to conduct the study. For example, purposeful sampling for demographic ho- mogeneity and selected phenomenal variation is a way a researcher working alone with limited resources can reduce the minimum number of sampling units required within the confines of a single research project, but still produce credible and analytically and/or clinically significant findings.

SAMPLE SIZES FOR DIFFERENT QUALITATIVE METHODS

Just as different purposeful sampling strategies require different minimum sample sizes, different qualitative methods require different minimum sample sizes. Morse ( 1994) has recommended that phenomenologies directed toward discerning the essence of experiences include about six participants, ethnographies and grounded theory studies, about 30 to 50 interviews and/or obser-

vations, and qualitative ethological studies, about 100 to 200 units of observation.

Additional considerations in matching sample size to method are within-method diversity and the multiple uses of a method. Phenomenology offers a good illustration of how within-method diversity and the particular use to which a method is put can alter the requirements for sample size. In a phenomenological case study, one case can be sufficient to show something about an experience that a researcher deems significant for spe- cial display (e.g., Wertz, 1983). One case will not be sufficient, however, if the researcher’s intention is to describe invariant or essential fea- tures of an experience. For example, a phenomenological study, as interpreted by Van Kaam (1959), will likely require 10 to 50 descriptions of a target experience in order to discern its necessary and sufficient constituents. When phenomenological techniques are used in the service of a goal other than to produce a phenomenology, such as generating items for an instrument, at least 25 descriptions of an experience will likely be required.

SAMPLE SIZES IN COMBINED QUALITATIVE AND QUANTITATIVE

STUDIES

Studies combining qualitative and quantitative approaches involve additional considerations in determining sufficient sample size. Indeed, so- called methodologically triangulated studies pre- sent researchers with many dilemmas (beyond the scope of this article), the resolution of which depend on the researcher’s stance concerning the compatibility of the philosophies and practices of qualitative and quantitative inquiry.

With respect to sampling, the logics of probability and purposeful sampling are arguably suffi- ciently irreconcilable in most cases to preclude using the same subjects for both quantitative and qualitative purposes (Morse, 1991). Subjects selected for the purposes of statistical representativeness may not fulfill the informational needs of the study, while participants selected for information purposes do not meet the requirement of statistical representativeness. Accordingly, whether primarily quantitative or qualitative, or whether designed for purposes of completeness or confirmation (Breitmayer, Ayres, & Knafl, 1993), such combination studies would require two samples drawn simultaneously or sequen- tially according to the two logics of sampling.

SAMPLE SIZE I SANDELOWSKI 183

Yet, it can also be argued that among persons chosen according to the logic of probability sampling, there will likely be articulate informants whose selection for the qualitative portion of a combined study can be justified as purposeful. The purposeful sample would have to be expanded only if the data obtainable from the participants already sampled was deemed informationally in- sufficient. Similarly, no additional sampling may be necessary in studies where further information obtainable from standardized instruments is de- sired about a purposefully drawn sample. The caveat here is that the researcher use the data from these instruments for purposes of fuller description, rather than to draw statistical inferences.

CONCLUSION

Determining an adequate sample size in qualitative research is ultimately a matter of judgment and experience in evaluating the quality of the information collected against the uses to which it will be put, the particular research method and sampling strategy employed, and the research product intended. Numbers have a place in ensuring that a sample is fully adequate to support particular qualitative enterprises. A good princi- ple to follow is: An adequate sample size in qualitative research is one that permits-by virtue of not being too large-the deep, case-oriented analysis that is a hallmark of all qualitative inquiry, and that results in-by virtue of not being too small-a new and richly textured understanding of experience.

REFERENCES

Breitmayer, B. J., Ayres, L., & Knafl, K. A. (1993). Triangulation in qualitative research: Evaluation of completeness and confirmation purposes. Image: Journal of Nursing Scholarship, 25, 237-243.

Cannon, L. W., Higginbotham, E., & Leung, M. L. (1988). Race and class bias in qualitative research on women. Gender & Society, 2 , 449-462.

Firestonc, W. A. (1993). Alternative arguments for generalizing from data as applied to qualitative research. Educational Researcher, 22, 16-23,

Glaser, B. G. ( 1978). Theoretical sensitivity: Advances in the methodology of grounded theory. Mill Valley, CA: Sociology Press.

Kuzel, A. J. (1992). Sampling in qualitative inquiry. In B. F. Crabtree & W. L. Miller (Eds.), Doing qualitative research (pp. 31-44). Newbury Park, CA: Sage.

Lincoln, Y. S . , & Cuba, E. G. (1985). Naturalistic inquiry. Beverly Hills, CA: Sage.

Miles, M. B., & Huberman, A. M. (1994). Qualitative data analysis: An expanded sourcebook (2nd ed). Thousand Oaks, CA: Sage.

Morse, J. M. (1986). Quantitative and qualitative research: Issues in sampling. In P. L. Chinn (Ed.), Nursing research methodology: Issues and imple- mentation (pp. 181-193). Rockville, MD: Aspen.

Morse, J. M. (1989). Strategies for sampling. In J. M. Morse (Ed.), Qualitative nursing research: A con- temporary dialogue (pp. 1 17- I3 I ) . Rockville, MD: Aspen.

Morse, J. (1991). Approaches to qualitative- quantitative methodological triangulation. Nursing Research. 40. 120-123.

Morse, J. M. (1994). Designing funded qualitative research. In N. K. Denzin & Y. S. Lincoln (Eds.), Handbook of qualitative research (pp. 220-235). Thousand Oaks, CA: Sage.

Patton, M. Q. (1990). Qualitative evaluation and research methods (2nd ed). Newbury Park, CA: Sage.

Ragin, C. C., & Becker, H. S. (1989). How the micro- computer is changing our analytic habits. In G. Blank, J. L. McCartney, & E. Brent (Eds.), New technology in society: Practical applications in research and work (pp. 47-55). New Brunswick, NJ: Transaction.

Ragin, C. C., & Becker, H. S. (1992). Whar is a case? Exploring the foundations of social inquiry. Cam- bridge: Cambridge University Press.

Sandelowski, M., Holditch-Davis, D., & Harris, B. G. (1992). Using qualitative and quantitative methods: The transition to parenthood of infertile couples. In J. F. Gilgun, K. Daly, & G. Handel (Eds.), Qualitative methods in family research (pp. 301-322). Newbury Park, CA: Sage.

Schatzman, L., & Strauss, A. (1973). Field research: Strategies for a natural sociology. Englewood Cliffs, NJ: Prentice-Hall.

Simons, H. (Ed.). (1980). Towards a science of the singular: Essays about case study in educational research and evaluation. Norwich: University of East Anglia, Center for Applied Research in Education.

Stake, R. E., & Trumbull, D. J. (1982). Naturalistic generalizations. Review Journal of Philosophy and Social Science, 7 , 1-12.

Strauss, A,, & Corbin, J. (199). Basics of qualitative research: Grounded theory procedures and techniques. Newbury Park, CA: Sage.

Trost, J. E. (1986). Statistically nonrepresentative stratified sampling: A sampling technique for qualitative studies. Qualitative Sociology, 9, 54-57.

Van Kaam, A. L. (1959). Phenomenal analysis: Exem- plified by a study of the experience of “really feeling understood.” Journal of Individual Psychology, 15,

Wertz, F. J. (1983). From everyday to psychological description: Analyzing the moments of a qualitative data analysis. Journal of Phenomenological Psy-

66-72.

chology, 14, 197-241.

Sample size in qualitative research Margarete Sandelowski

Education

Transcript of Sample size in qualitative research Margarete Sandelowski