The computer simulation of sociological surveys

10
COMPUTERS I N BEHAVIORAL SCIENCE THE COMPUTER SIMULATION OF SOCIOLOGICAL SURVEYS by Dean Harper Universily of Rochester, New York A computer simulation of a sociologist conducting sample surveys was devised in order to construct a general theoretical framework for the investigation of social phenomena, to study the implications of any sociological theory of surveys, to explore the utility of dif- ferent methods of analyzing survey data, to create a Turing process, and to train graduate sociology students. The model of the simulation had five components: population growth and change, sampling, opinion formation, opinion change, and opinion measurement. Random error, social and ecological influence, and measurement error were incorporated into these segments; however, in particular instances of using the simulation these can be included or bypassed. One instance of the use of the simulation, which incorporated ran- dom error and measurement error but not influence, revealed that estimates of means were not altered by error, but that the association between variables was reduced by the presence of error. CXI) OBJECTIVES HIS ATTEMPT to simulate a scientist T conducting a sociological survey’ was originally motivated by the desire to devise a theory that would explain a particular set of data obtained from a study on the sociology of mental illness. As the work progressed, the substance of the original survey was forgotten; in its place emerged the goal of devising a general framework into which any specific theory could be incorporated and which would be useful in a wide variety of contexts of survey research. For example, consider two investigations: one a study of attitudes about mental disorder and the other of attitudes on stu- dent demonstrations. In each, the investi- gator is interested in inferring the “under- lying causal structure,” e.g., what causes people to feel that certain behavior is symptomatic of mental disorder? what causes people to be unsympathetic with political activities of students? The two investigations will not necessarily invoke the same theory to explain both sets of data. However, it may be possible to in- corporate each distinct substantive theory into the same formal structure. In each study, the investigator might assume, for For another example of the simulation of so- cial research, see Soriquist (1968). example, that each respondent can be characterized by a number of general and unobservable dimensions that, with some probability, determine his responses to survey questions. Alternatively, the social scientist might not use the notion of unobservable dimensions, but rather assume that all observations, i.e., answers to qucs- tions, are measurements on continuous variables that can be related to each other by some kind of unknown mathematical relationship. Thus, there are alternative frameworks-any one of which could be used for substantively different surveys. The first objective, then, was to devise a theoretical framework into which any particular theory could be cast, and which could be used either to explain a set of real data or to generate a set of artificial data. The second objective emerged from the first. That goal was to study the implica- tions of a particular theory. For example, in a study of political attitudes, an investi- gator may have a theory about the effects of public catastrophes, such as riots, demon- strations, and the like, on political atti- tudes. The simulation may represent this in the following way: each public event constitutes a shock to the attitudes held by individuals. The degree of shock varies for different types of individual8 and de- clines, over the short run, with the occur- 47 1 Behavioral Science, Volume 17. 1972

Transcript of The computer simulation of sociological surveys

Page 1: The computer simulation of sociological surveys

COMPUTERS IN BEHAVIORAL SCIENCE

THE COMPUTER SIMULATION OF SOCIOLOGICAL SURVEYS

by Dean Harper Univers i l y of Rochester, New York

A computer simulation of a sociologist conducting sample surveys was devised in order to construct a general theoretical framework for the investigation of social phenomena, to study the implications of any sociological theory of surveys, to explore the utility of dif- ferent methods of analyzing survey data, to create a Turing process, and to train graduate sociology students. The model of the simulation had five components: population growth and change, sampling, opinion formation, opinion change, and opinion measurement. Random error, social and ecological influence, and measurement error were incorporated into these segments; however, in particular instances of using the simulation these can be included or bypassed. One instance of the use of the simulation, which incorporated ran- dom error and measurement error but not influence, revealed that estimates of means were not altered by error, but that the association between variables was reduced by the presence of error.

CXI)

OBJECTIVES

HIS ATTEMPT to simulate a scientist T conducting a sociological survey’ was originally motivated by the desire to devise a theory that would explain a particular set of data obtained from a study on the sociology of mental illness. As the work progressed, the substance of the original survey was forgotten; in its place emerged the goal of devising a general framework into which any specific theory could be incorporated and which would be useful in a wide variety of contexts of survey research.

For example, consider two investigations: one a study of attitudes about mental disorder and the other of attitudes on stu- dent demonstrations. In each, the investi- gator is interested in inferring the “under- lying causal structure,” e.g., what causes people to feel that certain behavior is symptomatic of mental disorder? what causes people to be unsympathetic with political activities of students? The two investigations will not necessarily invoke the same theory to explain both sets of data. However, it may be possible to in- corporate each distinct substantive theory into the same formal structure. In each study, the investigator might assume, for

For another example of the simulation of so- cial research, see Soriquist (1968).

example, that each respondent can be characterized by a number of general and unobservable dimensions that, with some probability, determine his responses to survey questions. Alternatively, the social scientist might not use the notion of unobservable dimensions, but rather assume that all observations, i.e., answers to qucs- tions, are measurements on continuous variables that can be related to each other by some kind of unknown mathematical relationship. Thus, there are alternative frameworks-any one of which could be used for substantively different surveys. The first objective, then, was to devise a theoretical framework into which any particular theory could be cast, and which could be used either to explain a set of real data or to generate a set of artificial data.

The second objective emerged from the first. That goal was to study the implica- tions of a particular theory. For example, in a study of political attitudes, an investi- gator may have a theory about the effects of public catastrophes, such as riots, demon- strations, and the like, on political atti- tudes. The simulation may represent this in the following way: each public event constitutes a shock to the attitudes held by individuals. The degree of shock varies for different types of individual8 and de- clines, over the short run, with the occur-

47 1

Behavioral Science, Volume 17. 1972

Page 2: The computer simulation of sociological surveys

472 COJlPUTERS IN BEHAVIORAL SCIENCE

renw of each successive event. Then the implications of the theory can be studied by examining data generated under diffcr- ent vondit’ions, e.g., attitudes resulting from one event versus scveral events, for different populations, and so on.

Social scientists have used diff ererit tech- niques for the analysis of survey data. Among these are tabular analysis, factor analysis, latent structurc analysis, regres- sion analysis, and path analysis (Hirschi & Selvin, 1967). Inasmuch as these different techniques have different requirements for their use and were devised for different situations, comparison of them would seem to make little sense. However, in each in- stance, the user att’empts to infer the under- lying causal structure. So, to the extent that these methods are used to achieve the same goal, they can be compared: What is their relative utility in achieving the goal of providing an explanation for survcy data? By simulating a set of data with a ltnown causal structure, known random errors, and known nonrandom errors, and analyzing it against its known internal structure, a social scientist can gain a sense of the ut,ility of any particular technique of dat,a analysis.2 To investigate the relative utility of different data analysis methods was a third goal.

A fourt’h objective was to create a Turing process. In discussing the question, “do computers think?,” Turing (1950) suggested that a computer can be said to think if an individual when interacting with a com- puter, say in a game of chess, does not know whether he is interacting with a computer or another human being; such a computer has come to be called a “Turing machine.” Thus, another goal of this simulation was to devise a computer simu- lation that would generate data that, would fool the sociologist; in analyzing the survcy data, the sociologist would not know

* For examples of methodological simulations see any recent issue of the Journal of the Americun S/atislical.4ssociation. 111 the June 1971 issue there are articles by Sobel (1971) and Fama arid Roll (1971), among others. Also see Wagner (1958), Nagar (19601, Cliff and Hamburger (1967) and Cliff and Penell (1967) for other examples.

whether he was working with artificial data or real d a h 3

A fifth objective grew out of the first four: to use a simulated survey to train graduate sociology student^.^ The simula- tion was uscd to train students in (1) cprstionnaire conrtruction, (.2) research design, ( 3 ) data analysis, and (4) theory construction.

THE MODEL UNDERLYING THE SIMULATION

The computer model has five major components with correqponding computer programs for each component. These com- ponents are (1) population growth and change, ( 2 ) sampling, (3) opinion forma- tion, (4) opinion change, and ( 5 ) opinion measurement. For particular simulations, some components or parts of components can be bypassed.

Population growth and change In this component, the user must initially

specify the demographic characteristics of the population, i.e., age, sex, race distribu- tions and the like. These might be the characteristics of the United States popu- lation, the population of some smaller geographic entity, or a hypothetical popula- tion. After the population has been speci- fied, the main program can go to the next component, the sampling segment. In this case, the population is static and the popu- lation change segment is bypassed. Alterna- tively, the initially specified population can change by the occurrence of the usual demographic processes of birth, death, marriage, as well as by processes of educa- tion, seeking employment, and the like. In this instance, the user must specify the values of various parameters which control the population changes, i.e., birth rates, death rates, etc.

The purpose of this component is not to

A set of simulated data was given to three sociologists-Allan Barton, John Ronquist and Hal Winsborough-to analyze. For different rea- sons, each suspected that the data were artificial.

4 This use of the simulation is described in Harper and Relvin (1970). For other uses of com- puters in training students, see Cline and Meyers (1970).

Behavioral Science, Volume 17, 1972

Page 3: The computer simulation of sociological surveys

COMPUTERS IN BEHAVIORAL SCIENCE 473

model dcmographic processes5 per se, but rather to study the effects of alterna- tive demographic processes on opinion formation. For example, a survey researcher might wish to study attitudes toward population problems 2nd how these vary in populations of different educational attributes. This is riot only a matter of comparing those with high education against those with low education; rather, it is studying opinion formation in popula- tions which have differing rates of educa- tional growth.

sampling In this component, the user can specify a

sampling scheme, e.g., random sampling, stratified sampling, quota sampling and so on. When the simulated survey is used for training purposes, a student is “given” a sum of money which he can ‘(spend” on the study; he specifies a particular research design (e.g., panel study, cross sectional study), a sampling design, sample size, and :L substantive questionnaire. Each part of his study (design, interviewing, coding, data analysis) consumes a portion of his funds. His objective is to conduct the study within the financial limits imposed on him.

Opinion formation The process of opinion formation can be

represented in one of three ways. The uscr must specify which representation he will use; each representation has specific parameters that must be set to particular numeric values.

One of these representations, the recur- sive equations model (Blalock, 1961, p. 64), will be discussed in detail; the other two representations will be described more briefly. In this first representation, indi- viduals are characterized by independent variables (typically, these are demographic variables, such as age, sex, race), one or more sets of intervening variables (ex- periences, such as participating in political

6For some computer simulations of demo- graphic processes, see Sheps (1969), Ridley and Sheps (1966), Potter, Sakoda and Feinberg (1968), May and Heer (1968) and Holmherg (1968).

activities; psychological dimensions, such as anomie, authoritarianism) and dependent variables (attitudes and opinions, which are the subject of the investigation).

All of the variables are assumed to be continuous over some specified range. Dis- crete variables, such as race, are assumed to be continuous; however, the observation of discrete variables results in one of two or three values in the range of the continu- ous values.

Then, the relationship between these variables is expressed in a series of recursive equations:

Y I , = e l ,

YzZ = f z z (Y ,J + e z ,

YaE = f3* (Y lE , YzJ + ea2 (1)

YIii = f k i ( y l i YZi ’ * * y k - - l , i ) f e k i

where i = 1, nj and j = 1, . . . , k . The equa.tions contain linear and quadratic terms that correspond to main effects, quadratic effects and interaction effects.

Random error is contained in the terms e l i , e 2 i , . . . , e k i . This is the effect of all exogenous variables on the variables in the system. With complete knowledge of the real world, i t would be possible to extend the equations to include all inde- pendent and intervening variables which have effects on the dependent variables and thereby reduce the error terms to a quantity close to zero. Random error is assumed to be normally distributed with a mea.n of zero and some specified standard deviation. For particular runs, the user can assume that there is no random error (i.e., the standard deviation of the distribu- tion of error is zero), which is equivalent to assuming that all relevant variables have been included.

The psychological variables are measured by specific items. For example, an anomie scalc results from answers to a single item or as an additive scale combining the re- sponses to several items. In this repre- sentation, the psychological variables are not underlying factors revealed by factor analysis.

Behavioral Science, Volume I?. 1972

Page 4: The computer simulation of sociological surveys

474 COIIPUTEHH IN BEHAVIORAL SCIENCE

The simulation gcnrrates a series of scores-one for rach iritcrvening and dc- pendent variable-for each respondent. As will bc seen belo\\, these scorrs arc not directly obwrvablc. In that sense they aic: latent v:tri:tblcs. Thus, for cuch yucstion in the survey, therc1 c.xists a single latent dinienrion. Howcwr, in this first represcn- tiition, reiponses to questions are not functions of a mor(3 basic underlying factor structurc.

The second rcyrcsentation, in contrast t o the fir*t, inc5orporates an underlying factor struc%urc ni intervcning between rehponses to items as dependent variable-, and demographic variables and cxperiencrs, :is independent variables. The user spccifirs a small number of factors and the factor loadings for rach itclm. For respondcrits he specifies thc relationship bct\vecm factor weights arid the independent vxriables. As with the first representation, scores for the dependent variables arc generated.

In the third representation, the Markov process model, respondents are assumcd to be in one of severallatent states. They move between latent states a t a rate which is :i function of the independent variables. The respondent’s answer to a particular question is probabilistically determined, \\ith the probabilities being a function of the latent states. Values for functions de- scribing movement between states and for the probabilities of response must also be specified. This reprrsentation6 corresponds with models developed by Coleman (1964).

Opinion change

This segment can be bypassed.’ When it is used, respondents, who have formed opinions in the previous segment, change their opinions under one or both of two kinds of influence: social effect and eco- logical effect. Social effect is the influenee deriving from the respondent’s place in a network of social relationships with family and friends; ecological effect is the in-

For an earlier discussion of this, see Harper

7 For n similar mechanism, see McPhee (1963). (I 088) .

fluerice that derives from the climate of opinion in the larger community.

In the opinion formtttion segment, the recursive equations and the factor analytic model result in scores for the dependent variables. In the opinion change segment, the score of :t respondent is incremented or decremented by some amount such that the changed score is closer to the aver- age scores of the respondent’s friends or those in his community. That is,

(4 S’ = So + c(S0 - X,) where So is thc respondent’s original score, 8, is the average score of friends or com- munity members, c is the coefficient of influence (the larger c is, the greater the influence; 0 5 c 5 I), and S’ is the score of the respondent after social influence.

In the Markov process model of opinion formation, one of the variables which affects the rate of movement of the re- spondent is the latent state in which the respondent’s friends are located. That is, a respondent has some pressure to move to the latent state of that of his friends or of the larger community.

Social influence can operate for one or more cycles. There has been no attempt to let the process run to a point of stability; rather, social influence operates for a fixed number of cycles.

Opinion measurement For the recursive equations model and

the factor analytic model, the opinion formation component generates attitudes in the form of scores or numerical values on undcrlying variables; it does not generate answers to survey questions, which is done in this, the opinion measurement com- ponent.

This component has two segments. In the first segment, the scores on each attitude dimension are transformed into question- naire responses. This is equivalent to mapping a set of continuous variables into :t set of discrete or integer variables. Such a mappirig can be achieved by an arbitrary step function, such as in Fig. I , which assumes that the answers to a fixed choice question are essentially graded along :a

Behavioral Science, Volume 17, 1972

Page 5: The computer simulation of sociological surveys

CONPUTEM IN BEHAVIORAL SCIENCE 475

and ovrr all interviews conducted by that particular interviewer.

z 2

z;

zo

0 ZO Z1 22 Z

FIG. 1. Step fuiictioii relating coded answers (2’) to attitude scores (2).

cotitinuun~ Tlut is, the function is

(3) %’ = g(Z)

\viic~c Z is somc Y j i . Suppose that a question permits ono of

four responses: “strongly agree,” “agree,” “dis:tgree,” and “strongly disagree.” Then, Z’ c a n take one of four values, viz.,

if Z 5 Zo , then 2’ = Zo’ (equivalent to strongly agree) if Z, < Z 5 Z1, thcn Z’ = 21’ (equiv- alent to agree) if Z1 < Z 5 Z 2 , t,hen Z’ = Z: (equiv- alcnt t.0 disagree) if Z2 < Z, thcn Z‘ = Z3’ (aquivalcnt t.o strongly disagree).

In the second segment (the Alarkov proc- ess model moves directly to this segment), rcspondcnts are interviewed but their rcsponses are subject to error. Specifically, these errors are (1) thc rcsponderit is not home, ( 2 ) the respondent refusw to a.nwer :-t particular question, (3) the respondcnt tcrrninates the interview, (4) the respondent gi vcs :L false answer, and (5) thc intervicwcr r (wrds the wrong answer. Each of these five errors ovcurs with a probability which is a function of the characteristics of the respondent, t,he characteristics of the qucs- tion being asked, and the characteristics of thc interviewer.

The initial probabilities of these errors necd to be specified for each particular siniulat’ion. The probabilities of errors ( 2 ) , (3) and (4) are altered during the intcr- vicw of each respondent; the probability of error (5 ) is altered during each interview

The parameters In order to run the simulation, the user

must assign numeric values to a variety of par:irneters that ch:tr:tcterize the sirnulation. iZ specified number of runs can be made with the parameters having the same numeric values or the values can be changed between each run or between each set of runs. A particular instance of a simulation will be reported below.

By varying the values of the parameterb, the user can study various processes; for example, he can study the effects of differ- ent levels of social and ecological influence by varying the vaIues of e in equation (2) above.

The data The simulation results in one or more

sets of data, viz., numbers on punch cards; this is like the coded data of a real inter- view and can be analyzed in the ways that real data are analyzed.

Observation of the simulation 1% number of observations can be made

on the simulated process that cannot be made on the real world. Fig. 2 sliows the alternative sequence of operatioils per- formed by the simulation, whereby some segments can be bypa d, and indicates the observations that can be made for any pathway through the simul a t’ ion.

The opinion formation segment providei the option of including or excluding random error-the effect of all independent variables not explicitly iiicluded in the model. The user can asmine that all iridependerit vari- ables have been included or that some rele- vant variables have been omitted. Opinion change from soci:tl or ecologic alqo is optional; it can be byp cluded. Liliewise, mcaburement error can be assumed to be present or absent. These three-random error, influence and meas- urement error--~vilI be referred to as effects. Thus, each of the effects can be present or absent. This yields eight cliffwent possible sequences through the simu1:ition arid hcncc

Behavioral Science, Volume 17, 1972

Page 6: The computer simulation of sociological surveys

476 COXPUTEILS I N BEHAVIORAL SCIENCE

Assign values t o : coefficients of recursive equatiom branching constants

Generate A ( 1

GenerateA(R.1)

5

Generate A ( 1

7

N

Generate A ( R . M 1 I Generate A(1 .M)

I I

w

Display Data - sets

this the last set of values for

Exit

FIG. 2. Flow chart of opinion formation, opinion change and opinion measurement segments of the simulation.

eight possible sets of observations. Table sequence 14-6-7-9-14-1 6-17 is the case 1 arrays these eight sequences. In each where random error is absent and influence sequence the numbers correspond to the and measurement error are present. numbers on the paths of Fig. 2. For example, In Fig. 2 there are eight boxes which con-

Behavioral Science, Volume 17, 1972

Page 7: The computer simulation of sociological surveys

COMPUTEttS I N BEHAVKH~AL SCIENCE 477

1-3-8 1-3-0-14-15-17 1-4-6 -7-8 1-1-6-7- 9-14-1617

2 - 3 ~ 8 2-3-9-10-12-1.3-16-17

TABLE 1

L l S l ’ l S G OF DIFFEREVT I’.\THW.\l-& THROUGH F L O W ~ l I . \ l l T O I

p.0. 2 A N D THE EFFECTS . \YD 1 ) h T . I SETS W H I C H .\HE .4JSO(’I I T E D WITH THOSE 1’ATHWAYS

None I1 I I, .w

R R , it1

j Effects Path which are I present

-~

Data sets which are generated

tain the statements generate A ( M ) , gericr- ate A ( R , I ) and so on. (The symbol A ( M ) represents the answers or rcsponses given in a survey when only measurement error is present, and so on.) For example, in path 2 is the box generate A(R) which corrrsponds to the presence of random error and the absence of the two other effects. Each one of the eight boxes corresponds to one of the eight possible patterns of the effects being present or absent.

An examination of Fig. 2 will reveal that seven of the eight sequences result in more than one set of data. This can be described in set notation. Consider the set of three elements, R , I and M , which represent the three effect.;. Then let

2’ = ( R , I , M I ,

u = ( i I , {El, { I t , . . * , f R , I , M f f . That is, I; consists of the set of subsets of T. When a data set is generated, one of the sets of U is selected. Suppose that V represents the set in U that has been selected. In gen- erating the data set corresponding to V , the program also gcnerates all data sets that correspond to subsets of V ; for example, when A(R, M ) is generated, A ( ), A(R) and A (14) are also generated.

It is as if the social scientist werc able to survey a sample of respondcnts and then resurvey them under different conditions but with the respondents having forgotten their

responses given in earlicr interviews. For example, if the user assigns a particular set of numeric values to the parameters, and selects, say, the third chain of Table 1, then he obtains two sets of data: A ( ), A(1) . Using the same numeric values, but selecting the second chain of Table 1 he obtains A ( ) and A (A[).

Alternatively, suppose that the social scientist wanted to study measurement error alone, but wanted to study the effects of different levels of measurement error on sur- vey data. Let A f j represent the j t h set of numeric values for the paramctcrs charac- terizing measurement error. Then, he could compare A ( M I ) , A (M,) , . . , A (M,) or com- pare A(R, M I ) , A(R, Mz) , e . . , A(R, A[,) or similar sets of data, whereby the values of the parameters in each set of data are iden- tical exrept for those for M .

AN EXAMPLE OF A SIMULATION

A wries of runs were made using the re- cursive cquations model. In each run the population was static, 50 random samples of 100 respondents each were interviewed, and their responses were subject to random crror and measurement error, but no social or ecological influence. Thus, effects R and A l were present and the sixth path of Tablc 1 was used.

This durvey made nine observations on each of the respondents. Three of thew were of independent variables, two were of inter- vening variables, and four were of dcpendcnt variablcs. Specifically, the variables were (1) race, where black is “high,” (2) age, (3) education, (4) anomie, where low score is low anomie, (5 ) political participation, where low score is low participation, (6) attitude about students having a voice in university decision making, (7) attitude about student demonstrations, (8) attitude about war industries, and (9) attitude about the Vietnam war. For example, item (6) was worded :

Students claim that they want more of a voice, than they now have, in choosing faculty, in planning curriculum and the like. Do you feel that students have enough of a voice now, that they have

Behavioral Science. Volume 17, 1972

Page 8: The computer simulation of sociological surveys

478 COnlPUTEHS I N BEHAVIORAL SCIENCE

Variables -\x Sex

Attitudes toward Student unrest, war Age

C (Education 1 \ 1

Independent Intervening Dependent Variables Variables Var la b les

FIG. 3. C:tusal structure between variables of simulated survey.

too much of a voicc now, or that they need more of a voice? - have too much voice now - have enough voice now - need more voice

For item six, (‘need more voice’’ corresponds to a high score.

The assumed causal structure is shown in Fig. 3. The relationship between the vari- ables in a survey was expressed, in general terms, in equations (1) above. For this sim- ulation, these equations were assumed to be linear.

The numeric values of the parameters re- mained constant for all of the 50 samples; these values are shown in Tables 2 and 3. Table 2 arrays the coefficients of the linear terms in the recursive equations. For exam- ple, the equation relating the score on item nine (a high score represented opposition to the war) to scores on the independent and intervening variables is

XS = .lox1 - .60Xz + .5oxs

+ 30x4 + 30x5 + eg’ .

T.IRLE 2

EQUATIONS.* Vh1,rF.S O F COEFkWXENTS OF 1,IXEAR TERNS IN RECrRPIVE

~ _ _

_ ~ _ De Independent variable pend-

en t \‘ariable 1 2 3 4 3

0 4 -.zn - .25 .30 5 .20 .20 - . 4 0 0 6 . I 0 -.60 .50 0 (1 7 . i n .3n -.so 0 0

.50 .30 .3(1 8 . I 0 - .30 Y . l 0 - .60 .50 .30 . 3 0

- -

* The numbering correqmirds to the items in the text

(Note that the numbering of variables, which corresponds to the items in the survey, is different from that indicated in equations (1) above. ‘To be consistent with those equations, XI = Y11 , Xz = Ylz , X 3 = 1’13 , x‘l = YZl, Xj = Y22, xs = Y31, x, = Y32, X s = Y,, , Xs = Ys4 , es’ = e3$ .)

As was mentioned above, five different observation errors could occur during the interview of any rcspondent. For this simu- lation, the probabilities of thc occurrence of these errors were generatchd from a linear

Behavioral Science, Volume 17, 1972

Page 9: The computer simulation of sociological surveys

COMPUTERS IN BEHAVIORAL SCIENCE 479

T.\BLE 3 (~OEE‘k‘IPlESTS OL.‘ LINEAR TERNS IN FUNCTION WIIlClI

< ; X X E X h T E S TlIE PROBhi3ILITIES OF MEASUREMEKT ERRORS.

.

Value of

Error ago agl a@ ag3 constant effect of ezi:zf etfect of

term race tion age

Not a t home .21 .02 -.01 - . O l Rcfueal .03 -.02 -.01 .01 Terminntes .04 - . 0 2 -.01 .01 Fiilse iinswer .06 - .02 -.01 .01 Intrrviearr wror .01 0 0 0

funetion,

I’r (error q ) = ago + a,&l + a,pZ, + a,&Tj ,

\\here Z1 represents race (2, = 0 if white, %1 = 1 if black), 2, represents educ:-ttion (Z, = 0, 1, . + . , 4 for the five education cat- egories: lev than 8, 8 to 11, 12, 13 to 15, 16 or more), 2, represents age (2, = 0, 1, . - . , 6 for the seven age categories, 15-24 years, 25-34 years, and so on). Table 3 arrays the values of the as for each of the five types of errors. From the values of Table 3, the reader can create tabIes giving the exavt probabilities of errors for each type of re- spondent. For example, the probability that a white, aged 45-54 with a 12th grade edu- ration, refuses to answcr a question is .03 + .02(0) - .01(2) f .01(3) = .04. If these cal- culations result in a negative probability, then it is set equal to zrro.

Thr numeric valucs in Tablcs 2 and 3 have bren a r b i t r d y assignrd. Ilowcver, th(.re 1% a\ an attempt to makc the assignment con- histmt with the sociological knowledge that has been gained in opinion survcys.

In thi? simulation, the effect which ran- dom and mcasuromerit error has on survc’y results was studied. Only a few findings will be presented herc. Consider item six. If the entire population could be interviewed with- out error, then the “true” proportion of 1)c.oplr sclcrting the first alternative, i.e., ‘‘have too much voice now,” to question six would be 5 2 7 . When the 50 samples of 100 respondents were combinrd into one sample of 5000, then the proportion giving that :tn\wcr was .3%. When sampling error arid rxidom error were prrsent, the estimate I V ~ Y

.510, and whcn all three errors occurred, the estimate was 517.

An examination of the distribution of esti- mates from 50 samples, each of 100 respond- ents, revealed little changc as the effects of additional errors accumulated. With one ex- ception the distributions did not differ sig- nificantly from the distribution that would be obtained if each sample were from a pop- ulation in which variable six had a binomial distribution with p = .527 m d if t.he normal approximation to the binomial was used to calculate expected values. The exception was thc distribution arising when only sampling error and random error were prcsrnt. This was judged to be a chance result., intlsmuch as the difference between the expected dis- tribution and tho distribution for the case when all three errors occurred was not sig- nificant, and inasmuch as the differences be- t,ween the expected distribution and the actual distributions for variables five, seven and right were not significant.

As is well known, measurement error tends to reduce the association that exists between variablcs. Table 4 shows this result for these simulated dat3a. The value of chi- square for the association bctween age, which is variable two, and attitude about ~t~udents’ role in university administration, variabk six, was comput,ed for eavh of the 50 sam- ples undcr each of tho three aonditioris: (i) sampling error, (ii) random error arid sam- pling error, (iii) mcasuremcnt error, random

T.\BLE 4

~ ) I S T H I B U T I O N OF ESTIMATES OP C l l l - S u U . k H E (ITEM 2, .\GE, Y d .

.~ ______.___~___ -~

Distribution Distribution Distribution zampling when

i)f \‘alue sampling ::z~t;$ random errol error is and present r ~ : ~ $ ~ ~ ~ ~ r mez;;;;y;t

error. when only when

Behavioral Science, Volume 17, 1972

Page 10: The computer simulation of sociological surveys

480 COXPUTERS IN BEHAVIORAL SCIENCE

error :tiid .:ampling error. These are shown in the last three columns of Table 4.

For each szmple interviewed without ran- dom crror or measurement error, the esti- mate of (,hi-square was significmt a t the 5 percent level. With random error alone, 8 percent of the estimates becnmci nonsignifi- cant; with random crror and me:tsuremc>nt error, 36 percent of the estimates were sta- tisticdy nonsignificant. Thus, there is a considertLble redurtion in the number of sig- nificant results once measurement error oc- curs, though this is, of course, dependent on the amount of measurement error.

CONCLUSIONS

The work reported here is illustrative of the kind of thing that can be done with this computer simulation of a sociological survey. One of the directions which this investigation is presently taking is an examination of the efferts of varying the degree of measurement error. Thus, how much measurement error can an investigator tolerate and still obtain reasonably valid results? Other directions, in addition to those dictated by the objectives stated earlier, include an investigation of the problem of inferring the undcrlying causal structure given that the model imposed on the analysis of the data is different from the model which generated the data. For cxam- ple, if an investigator conccives of the struc- ture underlying a set of dink as a hiarkov process, but the data werc in fact, generated by a recursive equations proc:c:ss, then can he nevertheless infer the “t’rue” substantive relation between the variables?

Computer simulation of survey research will not and can not replace thc theorct’ical analysis of the statistician or the s o d scientist, but i t can supplement that analy- sis and yield greater understanding of the dynamics of survey research.

REFERENCES Blalock, 11. %I., J r . Causal inferences in nonezperi-

inental yesearch. Chapel Hill, N.C.: Univ. N.C. Press, 1961.

Cliff, N., & Hamburger, C. D. The study of Sam- pling errors in factor analysis by means of artificial experiments. Psychol. Bull., 1967, 68, 43G4.15.

Cliff, N., & l’enncll, I t . The influence of commu- nality, factor strength, and loading size on the sampling characterist>ics of factor load- ings. Psychoinetrika, 1967,32, 309-326.

Cline, H. F., & Meyers, E. D., Jr . Problem solving computer systems for instruction in sociol- ogy. Arne?. SocioloGist, 1970, 5 , 365.370.

Coleman, J . S. Introduction to mathematical sociol- ogy. New York: Free Press of Glencoe, 1964.

F:ma, E. F., & Roll, R. Parameter estimates for symmetric stable distributions. J . Amer. slat. Ass., 1971,66, 331-338.

Harper, D. H. The computer simulation of a so- ciological survey. In Culcul et Formulisation ahns les Sciences de 1’Hornme. Paris: Editions du Centre National de la Recherche Scien- tifique, 1968, 239-251.

Harper, I). H., & Selvin, H . C. Teaching socio- logical research methods with the use of :t

computer. Paper presented a t the Interna- tional Federation for Information Process- ing Conference on Computer Education. Amsterdam, August 1970.

IIirschi, T., & Selvin, H. C. Delinquency research. New York: Free Press of Glencoe, 1967.

Holmberg, I. Deniographic models: DM 4. Gote- borg, Sweden: Uriiv. Goteborg Demographic Inst., 1968.

May, I>. M., & Herr, D. M. Son survivorship mo- tivation and family size in India: A com- puter simulation. Population Studies, 1968,

Mcl’hee, W. N. Formal theories of mass behavior. New York: Free Press of Glencoe, 1963.

Nagar, A . 1,. A Monte Carlo study of alternative simultaneous equation estimators. Econo- nzelrica, 19GO, 28, 573-593.

I’oti,cr, It. ( i . , Sakoda, J. RI., & Feinberg, W. E. Variable frcundability arid the t,iming of births. Euqen. Quart., 1968, 15, 155.163.

Ridley, J. C., & Sheps, M. C. An analytic simula- tion model of human reproduction with demographic and biological components. Populution Studies , 1966, 19, 297-310.

Sheps, M. C . Simulation methods and the use of models in fertility analysis. Paper presented a t the International Union for the Scientific Study of Population, General Conference, London, Scptember 1969.

Sobel, E. Approximate best lineitr prediction of a certain class of stationary and nonsta- tionary noise-distorted signals. J . .diner. stat. A s s . , 1971, 66, 363-372.

Sonquist, J. Simulating the research analyst. Soc. sci. In fo . , 1968, 18, 207-215.

Turing, A. M. Computing machinery and intelli- gence. M i n d , 1950, 59, 433-460.

Wagner, H. A Monte Carlo study of estimates of simultaneous linear structural equations. Econonielrica, 1058, 26, 117-133.

(Manuscript received January 7, 1972)

22, 199-210.

Behavioral Science. Volume 17, 1972