DOCUMENT RESUME MD 095 900 Kauffman, Dan; And Others · DOCUMENT RESUME. MD 095 900 IR 001 082....

41
DOCUMENT RESUME MD 095 900 IR 001 082 AUTHbr Kauffman, Dan; And Others TITLE The Empirical Derivation of Equations for Predic'cing Subjective Textual' Information. Final Report. INSTITUTION Air force Human' Resources Lab., Williams AFB, Ariz. Flying Training Div. REPORT NO. AFHRL-TR-74-55 PUB DATE Jul 74 NOTE 40p. EDRS PRICE MF-$0.75'HC-$1.85 PLUS POSTAGE DESCRIPTORS *Education; *Information.Theory; *Instructional Technology; *Psychology; *Training ABSTRACT A study was made to derive An equation for predicting the "subjective" textual information contained in a.text of material written in-the English language. Specifically, this investigation describes, by a mathematical equation, the relationship between the "subjective" information militant of written textual material and the relative number of errors committed by a learner when asked to predict, letter by letter, the content of given textual material. The relationship shows that the subjective information of a given text for a specific learner is directly proportional to the number of wrongly-guessed signs made by that learner. This is expressed mathematically by: 1=3.1E; Where: I=Information in Bits; E=Number of wrongly gueiSed signs. The application of Shannon's guessing procedure (1951) in this study permits the measurement of the ',subjective information of a given text for a specific learner. The derived equation permits the measurement of information in tel.'s of a value that is dependent not only on the inherent qualities of the subject matter, but also on the internal state of the learner. (Author /. WCM)

Transcript of DOCUMENT RESUME MD 095 900 Kauffman, Dan; And Others · DOCUMENT RESUME. MD 095 900 IR 001 082....

Page 1: DOCUMENT RESUME MD 095 900 Kauffman, Dan; And Others · DOCUMENT RESUME. MD 095 900 IR 001 082. AUTHbr Kauffman, Dan; And Others TITLE The Empirical Derivation of Equations for Predic'cing.

DOCUMENT RESUME

MD 095 900 IR 001 082

AUTHbr Kauffman, Dan; And OthersTITLE The Empirical Derivation of Equations for Predic'cing

Subjective Textual' Information. Final Report.INSTITUTION Air force Human' Resources Lab., Williams AFB, Ariz.

Flying Training Div.REPORT NO. AFHRL-TR-74-55PUB DATE Jul 74NOTE 40p.

EDRS PRICE MF-$0.75'HC-$1.85 PLUS POSTAGEDESCRIPTORS *Education; *Information.Theory; *Instructional

Technology; *Psychology; *Training

ABSTRACTA study was made to derive An equation for predicting

the "subjective" textual information contained in a.text of materialwritten in-the English language. Specifically, this investigationdescribes, by a mathematical equation, the relationship between the"subjective" information militant of written textual material and therelative number of errors committed by a learner when asked topredict, letter by letter, the content of given textual material. Therelationship shows that the subjective information of a given textfor a specific learner is directly proportional to the number ofwrongly-guessed signs made by that learner. This is expressedmathematically by: 1=3.1E; Where: I=Information in Bits; E=Number ofwrongly gueiSed signs. The application of Shannon's guessingprocedure (1951) in this study permits the measurement of the',subjective information of a given text for a specific learner. Thederived equation permits the measurement of information in tel.'s of avalue that is dependent not only on the inherent qualities of thesubject matter, but also on the internal state of the learner.(Author /. WCM)

Page 2: DOCUMENT RESUME MD 095 900 Kauffman, Dan; And Others · DOCUMENT RESUME. MD 095 900 IR 001 082. AUTHbr Kauffman, Dan; And Others TITLE The Empirical Derivation of Equations for Predic'cing.

)3E T COPY. AMAMIAFHR.I.-TR-74-55

FOCE

110.1.1111PARTIAN NT OP HEALTH,

'EDUCATION a WELFARENATIONAL INSTITUTE OP

EDUCATIONTHIS DOCUMENT HAS IIE,EN REPRODUCED EXACTLY AS RECEIVED FROMTHE PERSON OR ORGANIZATION ORIGINATISIG IT POINTS OF VIEW OR OPINIONSSTATED DO NOT NECESSARIL,V REFIRESENT OFFICIAL NATIONAL INSTITUTE OFEDUCATION PDSiTION.OR POLICY

H PTHE EMPIRICAL DERIVATION OF EQUATIONS FOt

SUBJECTIVE TEXTUAL INFORMATION

'AN

R

S

By

Dan KauffmanMike JohnsonGene Knight

Arizona State UniversityTempe, Arizona 85281

FLYING TRAINING DIVISIONWilliams Air Force Base, Arizona 85224

July 1974

Approved for public release; distribution unlimited.

LABORATORY

AIR FORCE ,SYSTEMS COMMANDBROOKS. AIR FORCE BASE,TEXAS 78235

Page 3: DOCUMENT RESUME MD 095 900 Kauffman, Dan; And Others · DOCUMENT RESUME. MD 095 900 IR 001 082. AUTHbr Kauffman, Dan; And Others TITLE The Empirical Derivation of Equations for Predic'cing.

NOTICE

.

When US Government drawings, specifications, or other dap. are usedfor. any purpose other than a definitely related Governmentprocurement operation, the Government thereby incurs noresponsibility nor any obligation whatsoever, . and the fact that theGovernment.. may have formulated, furnished, or in any way suppliedthe said drawings, specifications, or other data is not to be regarded byimplicatioin or otherwise, as in any manner licensing the holder or anyother person or corporation, or conveying any rights or permission tomanufaciure, ust., or sell any patented invention that may in any waybe related thereto.

This final report was submitted by Arizona State University, Tempe,Arizona 85281, under contract F41609-71. 00027, pr. ject '1123, withthe Flying Training Division, Air Force Human Resources Laboratory(AFSC), Williams Ai: Force Base, Arizona 85224. Captain Gary B 'Reid,Flying Training Division, was the contract monitor.

This report has been reviewed and cleared for open publication and/orpublic release by the appropriate Office of Information (01) inaccordance with AFR 190-17 and 1 5230'.9. There is no objectionto unlimited distribution of this ieoort to the public at large, or byDDC4o the National Technical Information Lervice (NTIS).

This technical report has been reviewed and is approved.

WILLIAM V. HAGIN, Technical Director. .

Flying Training Division

Approved for publication.

HAROLD E. FISCHE., Colonel, USAFCommander

Page 4: DOCUMENT RESUME MD 095 900 Kauffman, Dan; And Others · DOCUMENT RESUME. MD 095 900 IR 001 082. AUTHbr Kauffman, Dan; And Others TITLE The Empirical Derivation of Equations for Predic'cing.

UnclassifiedSECURITY CLASSIFICATION OF THIS PAGE (Whorl Data Entered)

REPORT DOCUMENTATION PAGEREAD

BEFORE COMPLETING FORM

I. REPORT NUMBER

AFHRLTR74.55

2. GOVT. ACCESSION NO. 3. RECIPIENT'S CATALOG NUMBER

4. TITLE (and Subtitle)

THE EMPIRICAL DERIVATION OF EQUATIONS FORPREDICTING SUBJECTIVE TEXTUAL INFORMATION

'

5.. TYPE OF REPORT & PERIOD COVERED

Final

6. PERFORMING ORG. REPORT NUMBER

7. AUTHOR(s)Dan KauffmanMike JohnsonGene Knight

S. CONTRACT OR GRANT NUMBER(s)

. .

F41609.71-C0027

9. PERFORMING ORGANIZATION. NAME AND ADDRESS

Arizona State UniversityTempe, Arizona 85281 .

10. PROGRAM ELEMENT. PROJECT, TASKAREA & WORK UNIT NUMBERS

r

11230205

II. CONTROLLING OFFICE NAME AND ADDRES:i

Hq Air Force Human Resources Laboratory (AFSC)lo

Brooks Air Force Base, Texas 78235.

.

12. REPORT DATE

July 197413. 'NUMBER OF PAGES

40

14. MONITORING AGENCY NAME & ADDRESS(!( different from Controlling Office)Flying Training Division ..

..

Air Force Human Resources Laboratory

.crWilliartis Air Force Base, Arizona 85224

.

'

15. SECURITY CLASS. (of this report)

Unclassified

15a. DECL ASSI FICATION/ DOWNGRADINGSCHEDULE

16 DISTRIBUTION ,STATEMENT (ol this Report)

Approved forfor public release; distribution unlimited. .

, .

7. DISTRIBUTION STATEMENT (of :he abstract entered In Block 20, If different from Report)

13. SUPPLEMENTARY NOTES .

This report was approved (74-58. 9 April 1974) for Publication in the Instructional Science, an internationaljournal.

19. KEY WORDS (Continue.on reverse side if necessary and Identify by block number)

.information theory #psychologyinstructional. technologyeducation .

training

20. ABSTRACT (Continue on reverse side If necessary and identify by block number)Parameters pertaining to information processing by human beings have, in the past, been determined by

learning and memory experiments with nonsense syllables, number sequences, etc. However, in the real world we areconcerned with the processing of "meaningful information" not senseless texts, Therefore, the determination ofsubjective information directly from meaningful material becomes extremely important in the instructionlearningprocess.

This ,study derives an equation for predicting the subjective textual information contained in a text of material:written in the English language. Specifically, this investigation describes, by a matheinatical equation, therelationship between the subjective information content of written textual material and the relative number of errors

DD 1 JAN 73 1473 EDITION OF 1 NOV 65 IS OBSOLETEUnclassified

SECURITY CLASSItrICATION OF THIS PAGE (When Data Entered)

Page 5: DOCUMENT RESUME MD 095 900 Kauffman, Dan; And Others · DOCUMENT RESUME. MD 095 900 IR 001 082. AUTHbr Kauffman, Dan; And Others TITLE The Empirical Derivation of Equations for Predic'cing.

Unclassified

SECURITY CLASSIFICATION OF THIS PAGE(Whon Data Entered)

, Block 2t.) (continued)

committed by a learner when .asked to predict. letter by letter, the content of given textual material. Thisrelationship shows that the subjective information of a given text for a specific learner is directly proportional lo thenumber of wronglyguessed signs made by that learner. This is expressed mathematically by

Whe

1 3.1 E

re: I = Information in Bits

E = Number of wrongly guessed signs

The application of Shannon's guessing proce'dure (1951) in this study permits the measurement of thesubjective information of a given text for a specific learner. Unlike senseless texts, the subjective information of ameaningful text varies from learner to learner. Therefore, the derived equation permits the measurement ofinformation in terms of a value that is dependent not only upon the inherent qualities of the subject matter, but alsoupon the internal !tate of the learner.

The dethe German I

rived equatIbn fOr the English languag.: is then compared to an equation derived by Weltner (1967) foranguage and found to be remarkably s nilar.

UnclassifiedSECURITY CLASSIFICATION OF THIS PAGE(When Data Entered)

Page 6: DOCUMENT RESUME MD 095 900 Kauffman, Dan; And Others · DOCUMENT RESUME. MD 095 900 IR 001 082. AUTHbr Kauffman, Dan; And Others TITLE The Empirical Derivation of Equations for Predic'cing.

PREFACE

This study was Conducted under Project 1123, USAF Flying Train-ing Development, Task 112302, Instructional Innovations in USAF FlyingTraining.

- The research was carried out under the provisions of contractF41605-71-00027.by the Educational Thchnology Department of ArtzonaState University, Tempe, Arizona 85281. Contract monitor was CaptainGary B. Reid.

4

Page 7: DOCUMENT RESUME MD 095 900 Kauffman, Dan; And Others · DOCUMENT RESUME. MD 095 900 IR 001 082. AUTHbr Kauffman, Dan; And Others TITLE The Empirical Derivation of Equations for Predic'cing.

BEST COPY &RABLE

TABLE OF CONTENTS

LIST OF TABLES

Pale

LIST OF FIGURES5

.INTRODUCTION 7/

Objective 7 .

Rationale 7

Instruction Learning Process -8.

Quantifying Information 8Examples of Guessing Procedlre '11

Procedures and Results 12

PHASE I: COMPUTER PROGRAM DEVELOPMENT . . . . ..... . . . . 13.

Proge6m CYBER4 13

Program CYBER2 13

Program EQUFIT 14

Program PTAFIT 15

PHASE II: SELECTION AND PRESENTATION OF TEXTUAL INFORMATION . . ,15

Experimental Subjects 15

PHASE III: CALCULATION OF SUBJECTIVE INFORMATION 17

PHASE IV: DETERMINAT1ONOF EQUATIONS 17

PHASE V: SELECTION OF "BEST-FITTING" EQUATION

The English Equation 14

The German Equation 34

CONCLUSION 36

REFERENCES 37

3

Page 8: DOCUMENT RESUME MD 095 900 Kauffman, Dan; And Others · DOCUMENT RESUME. MD 095 900 IR 001 082. AUTHbr Kauffman, Dan; And Others TITLE The Empirical Derivation of Equations for Predic'cing.

LIST OF TABLESova

Table'

. .

1. X and Y Coordinates for Hypothetical Sample

2. X and Y Coordinates for Group .3

3. Least Squares Curve Fit - EQUFIT Program

4. PTAFIT Least Squares Curve Fit r

5. Equation Type 6: YwX/(A+B*X

6. Equation Type 1:i YagA+(B*X)

.1)221

14

18

19

20

22

22.

7. 'Comparison of Indices of Determination 23

8. Equation Type 1: YA+(B*X) 24-

9.' Equation Type 6: YsiX/(A+B*X) 25

10. Outlier Test Using Chazal's Method 26

11. Outlier Test Using Chazal's Method 27

12. Least Squares Curve Fit with Outlier Removed . . . 28

13.. Equation Type 3: Ys.A*(X+8) - Least Squares"CurveFit with Outlier Removed 29

14. Equation Type 1: Y..A+(B*X) - Least Squares CurveFit with Outlier Removed 30

15. Comparison of Indices of Determination withOutlier Removed 31

16. 'Equation Type 1: Y=A+(B*X) - Summary Table withOutlier Removed 32

17. Equation Type 3: Y=A*(X4 - Summary Table withOutlier Removed 33

4

Page 9: DOCUMENT RESUME MD 095 900 Kauffman, Dan; And Others · DOCUMENT RESUME. MD 095 900 IR 001 082. AUTHbr Kauffman, Dan; And Others TITLE The Empirical Derivation of Equations for Predic'cing.

BESTcon AVAILABLE

LIST OF FIGURES

Figure, 11

1. SimpliCed View of Communication System 9.

2. Plot of English and German equations 21

0

Page 10: DOCUMENT RESUME MD 095 900 Kauffman, Dan; And Others · DOCUMENT RESUME. MD 095 900 IR 001 082. AUTHbr Kauffman, Dan; And Others TITLE The Empirical Derivation of Equations for Predic'cing.

BEST COPY AVAILmLL

THE EMPIRICAL DERIVATION OF EQUATIONS FOR PREDICTING

SUBJECTIVE TEXTUAL INFORMATION

ts.

Introdu.ztlon

This stydy was designed to empirically derive an equation f8rpredicting the subjective textual information contained in a text ofmaterial written in the English language. Specifically, this investi- .

gation describes, by a mathematical equation, the relationship betweenthe subjective InfOrMation content of written textual material and therelative number of errors committed by a learner when asked co predict,letter by.letter, the content.of the textual material.

Rationale

Parameters pertaining to information processing by human beingsPave; in the past, been determined.by learning and memory experimentswith nonsense syllables, number sequences, etc. Because of the rela-tive simplicity of these types of experiments, the flow of informationand the subsequent information processing of these senseless texts'could be measured and varied according to exact prescriptions. However,in the real world we are,concerned with the processing of "meaningfulinformation.," not senseless texts. When the parameters are determinedfrom senseless texts uncertainty is contained in the extrapolation tomeaningful materials. Therefore, if we could determine the subjectiveinformation directly from meaningful material, we would most certainlyreduce these uncertainties.. As a result, one would be able to answerquestions of the following type: 0

1. What is the amount of information contained in the lawsof thermodynamics for a specific learner?

2. How much information does a learner gain, in a certainperiod of instruction, under a given set of Instructionalconditions?

3. How great is the flow of information through a particularinstruction-learning.channel? Is it too much? Is it toolittle?

h. How much does.the learner already know?

5. How much does the learner need to know?

7

Page 11: DOCUMENT RESUME MD 095 900 Kauffman, Dan; And Others · DOCUMENT RESUME. MD 095 900 IR 001 082. AUTHbr Kauffman, Dan; And Others TITLE The Empirical Derivation of Equations for Predic'cing.

tot usi wooInstruction Leaftlina_Process

instruction and learning are two vital aspects of the educationalprocess that depend upon the manipulaticn of three activities: theinput of meaningful information; the processing of this information;and the output of meaningful information. These three activities forMthe basis for the science of information theory.

The science of information theory, since it deals with the fun-damental processes involved in the instruction-learning process, pro-vides the quantitative instruments needed to describe this process.

When we discuss the instruction-learning process we are essen-tially describidg a communication system. Typically, a communicationsystem can be described in terms of the following (see Figure 1):

Information source: The mechanism that selects a desiredmessage. out of a set of possible messages.

Transmitter: Encodes the message into a signal.

Channel: The medium through which the signal is transmitted.

Receiver: Accepts and decodes the transmitted signal intothe message.

Destination: Interprets the message.

Noise: Unwanted additions imposed on VA message. Theseadditions can be either from an external source, inter-nal source or combinations of the two. Noise changesthe intended message..

Quantifying Information

Since instruction is concerned with the transference of infor-mation and learning is the act of .(or the result of) the informationtransfer, it is obvious that we are not only describing a communicationsystem, but also an instruction-learning system. The information sourceand transmitter being the teacher, textbook, computer, etc., while thereceiver is the learner. Because our instruction-learning system is eninformation procesFing system, we are now faced with several questions:

4

1. What is information?2. How can we measure information?3. What are the characteristics of an efficient coding process?4. How could we measure the capacity of a channel?

8

Page 12: DOCUMENT RESUME MD 095 900 Kauffman, Dan; And Others · DOCUMENT RESUME. MD 095 900 IR 001 082. AUTHbr Kauffman, Dan; And Others TITLE The Empirical Derivation of Equations for Predic'cing.

0, BEST COPY AVAILABLE

1;- Information Source

2 - Transmitter3 - Channel

4 - Receiver5 - Destination6 - Noise

Figure .1 .--Simpl ified View *of Communication System.,

9

Page 13: DOCUMENT RESUME MD 095 900 Kauffman, Dan; And Others · DOCUMENT RESUME. MD 095 900 IR 001 082. AUTHbr Kauffman, Dan; And Others TITLE The Empirical Derivation of Equations for Predic'cing.

Aug ABLE

5. What are the general characteristics of noise?6. What are the effects of noise on a message?7. How do we increase or reduce the effects of noise?

All of the questions are intriguing and deserve attention. How-ever, this investigation addresses itself only to the first two ques-tions. The other questions provide a basis !or additional studies alongwith extending the results of this investigation.

Many definitions of information are given in the literature(Ash, 1965; Edwards, 1964; Frank, 1962; Fuchs, 1968; Guilbaud, 1959;Khinchin, 1957; Singh, 1966; Weltner, 1970; & others). But one appro-priate for this investigation is the definition given by Weaver (1969).He says, "Information is a measure of one's freedom of chnice when one'.selects a message." This conveniently ties the definition of informa-tion and its measurement directly together.

To take the simplest case of twv) alternative messages (yes orno), we will say that the information associated with this ensembleof two messages is unity. The concept of information, then, appliesnot to the individual message but to the ensemble as a' whole. '

To oe'more precise, we; an define the amount of informhtion bythe logarithm of the number of available choices (messages) containedin the ensemble. Thus, in our case of two messages (yes or no), theinformation is proportional tolthe logarithm of 2 to the base 2 suchthat: 1og22 u 1 which is unit. This unit of information is oajledia "bit" from the two words binary digit. To generalize then, l'f7Vill..dhave 8 alternative messages, among which we are equally free to choose,we have log28 = 3 giving us an/ensemble consisting of 3 bits of infor-mation.

This situation is completely adequate if we are only dealingwith messages than have an equally probable chance of being selected.The probability of this occurring in language structure is extremelysmall and is zero for the/English language. In this study we are con-cerned with the information content cF the English language which iscowposed of messages (letters, signs) arranged into stochastic ensem-bles. These ensembles are arranyed according to certain probabilitiesin which the probabilities depend on previous events.

Because of the probabilistic dependencies of the message (let-ters), it becomes exceedingly more difficult to measure the amount ofinformation in the English language. However, Shannon (1951) deriveda "guessing procedure" for analyzing the syntactical information contentof meaningful texts. His guessing procedure enables one to calculatethe subjective information of a text for a specific learner. Thelearner attempts to guess the ensemble, letter by letter. After each

Page 14: DOCUMENT RESUME MD 095 900 Kauffman, Dan; And Others · DOCUMENT RESUME. MD 095 900 IR 001 082. AUTHbr Kauffman, Dan; And Others TITLE The Empirical Derivation of Equations for Predic'cing.

BEST COPY AVAILABLE

guess he is told whether his guess v4s correct or not. If incorrect,

he guesses again. If correct, he guesses the next letter or message

of the ensemble until the entire ensemble has, been derived. This pro-

cedure is based on the assumption that at each guess the learner-will

name the letter with the highest subjective probability. Thus, we have

.a measure of the amount of information a particular ensemble has for_a_

particular learner. Shannon's equation forms the basis for this inves-

tigation and is presented later. An example of the guessing procedure

and subsequent analysis follows.

Example of Guessing Procedure

rhe guessing procedure enables one to empirically estimate the

subjective information of a text for a particular subject. The subjec-

tive information Lan be cakulated ac'ording to several techniques.

However, each .echnique assumes that the subject guesses the--.sign with

the highest subjective probability. Consider the following hypothetical

example:

THE COW JUMPED 0

612 1 421 1 811111 f 3

Under each letter of the text is the number of guesses the subject made

until the correct sign (letter) was guessed. From these numbers it is

relatively easy to calculate the information content of the -text by

adding the information value of the signs. The first letter (T) con-

tains 1o926 bits of information, the second letter (H) 1og21 bits of

information. Therefore, the first letter (T) contains 2.58 bits(1og26 a 2.58) whereas the second letter (H) contains 0 bits (log21 a 0).

While the first letter did contain information for the subject (2.5.8

bits), the second was perhaps expected with certainty and was, therefore,

devoid of any information (0 bits). The total information of the text

can now be measured in bits as:

H(text) m 1og26 + 1og21 + 1og22 + 1og21 + 1og24 + 1og22

+ logol + log21 + 1og28 + 1og21 + 1og21 + log21t

+ log21 + 1og21 + 1og21 + 1og23,

= 1og28 + 1og26 + 1og24 + 1og23 + 2 1og22 + 10 1og21

. 3.00 + 2.58 +.2.00 + 1.58 + 2.00 + 0

= 11.16 BITS

Page 15: DOCUMENT RESUME MD 095 900 Kauffman, Dan; And Others · DOCUMENT RESUME. MD 095 900 IR 001 082. AUTHbr Kauffman, Dan; And Others TITLE The Empirical Derivation of Equations for Predic'cing.

PRE CRY AVAILABLE

Dividing this by the length of the 4.xt N = 16 we arrive at .698 Bits/Sign forhe information content of our sample text.

Although this method is relatively simple, it does not providethe best estimate of the information content. Shannon.(1951) derivedequation (1) which according to Frank (1962) is not only a better mea-sure but also is closer to the actual figure than other calculations.would indicate.

H(text) = E (r ld(r) - (r-1) ld(r -1))

Where: N = amount of information measured in bits

Nr= absolute frequency with which the number r occurs

.

in the guessing sequence of .N signs

Id-= logarithm to the base 2 (log2r)

Applying Shannon's equation to our example, we obtain 18.248 bits asthe information of the text segment or 1.405 bits/sign which is abetter estimate than our previous figure of .698 bits/sign.

Procedures and Results

This study consists of five procedural phases of operation.They are::

'Phase I: The design and development of computer programsto present and process the experimental data.

Phase If: -The selection and presentation of textual infor-mation to selected subjects and the collectionof their responses.

Phase III: Calculation of subjective information contentcontained in the textual material for selectedsubjects.

(1)

Phaie IV: Derivation of equations to fit experimental data.

Phase V: Selection of "best-fitting" equation, and deletionof outlier point.

12

Page 16: DOCUMENT RESUME MD 095 900 Kauffman, Dan; And Others · DOCUMENT RESUME. MD 095 900 IR 001 082. AUTHbr Kauffman, Dan; And Others TITLE The Empirical Derivation of Equations for Predic'cing.

la

PHASE I

eEsr colt *Novo

Computer Program Development

Two computer programs were written specifically for this inves-

tigation. Two other programs were used in the synthesis of the data.he programs, written in BASIC, were designed to operate in a time-

shared environment.

Program CYBER4

Program CYBER4 was designed as a general purpose program to pre-sent seletted textual passages to the subjects. CYBER4 can present

textual information by letters, wordi, phrases, sentences, paragraphs,or any combination thereof. The content presentation can be predeter-mined by the investigator or the program can be adapted to functionunder learner control. In addition, it can also be adapted to displayinformation at any intellectual level.

The program initially outputs an extended phrase or set ofphrases. The length of the phrase is pre-programmed by the investiga-tor and, for the most part, depends on the complexity of the materialand the relative. sophistication of the subject with regard to the

.academic content.

After the initial phrase is printed by the computer terminal,the subject is required to guess, letter by letter, the subsequentmessage. If the subject is incorrect, his response is placed in a"response file;" a zero is recorded in a "score file," and the subjectis required to guess again. This process continues until his response

is correct.

If the subject's guess is correct, his response is placed inthe "response file;" a one is recorded in the "score file;" his cor-rect answer is. acknowledged by the computer; and the next letter inthe guessing sequence is presented. When the subject has respondedto.the entire text, he is thanked for his effort and the time of com-pletion is recorded by the computer.

Program CYBER2,

Program CYBER2 was developed to calculate the amount of infor-.

.matIon for each subject, using equation (1) derived by Shanhon (1951).

13

Page 17: DOCUMENT RESUME MD 095 900 Kauffman, Dan; And Others · DOCUMENT RESUME. MD 095 900 IR 001 082. AUTHbr Kauffman, Dan; And Others TITLE The Empirical Derivation of Equations for Predic'cing.

As can be seen. in the sample output listed in Table 1, CYBER2outputs the percentage of incorrect responses (X s) per subject and theamount of information per subject (Yss) contained in the text for thatparticular subject. The measurement of information is in units ofbits/sign, the sign being the message unit letter. Each pflir of numbersrepresent the X and Y coordinates used in fitting an equation to thedata. In addition to calculating the X and Y values, CYBER2 also ordert.the results from the lowest to highest on the percentage of incorrectresponses (Xss

TABLE I

X AND Y COORDINATES FOR HYPOTHETICAL SAMPLE

Percent Incorrect (Xss) Bits/Sign (Yss)

X( 1) = 22.2222 Y( 1) - .9148X( 2) = 25.0000 Y( 2) - .5629X( 3) = 26.0000 Y( 3) -1.3686x( 4) = 60.0000 yc 4) 1.8500

Program EQUFIT

Program EQUFIT is a commercially developed program available onthe MG-255 time-sharing compUter system at Arizona State University.EQUFIT applies curve fitting techniques to determine which of six generaltypes of curves best fits the supplied data. The six general types ofcurves range from a linear function to a hyperbolic function.

The X and Y coordinates generated by Program CYBER2 are used byEQUFIT to determine the equations that have the l'best fit" for the sup-plied data. EQUFIT outputs the following information for each equationspecified by the user:

1. Index of determination for each equation

2. Coefficients for each equation

3. X and Y values as supplied by user

4. Calculated values of Y for each specified equation

5. The percentage difference between the/actual Y value andthe calculated Y value for each tpecified equation.

14

Page 18: DOCUMENT RESUME MD 095 900 Kauffman, Dan; And Others · DOCUMENT RESUME. MD 095 900 IR 001 082. AUTHbr Kauffman, Dan; And Others TITLE The Empirical Derivation of Equations for Predic'cing.

Program PTAFIT.r,

PTAFIT is vmodified and improved version of EQUFIT developed byPlan-Test Associates of Phoenix, Arizona.* PTAFIT greatly extends the

capabilities oEQUFIT. In addition to the information supplied byEQUFIT, PTAFIT supplies the following:

BESTCOPYAMIIABLE

1. The level of significance at which the equation fits thesupplied data.

ti

2. The value of each coefficient, the standard deviationassociated with that coefficient and confidence limitsfor the coefficieht. The confidence limits are chosenby tile user and can be at the 90%, 95;, and/o the 99%level of significance.

3. Confidence bands about each (X, Y) coordinate. The user

maY'specify the 90%, 95%, and/or 99% prediction limits.

4. Evaluation of how significant is the apparent differencesbetween equations.

PilASE II

Selection and Presentation-of Textual Information

The textual information for this investigation was selectedfrom two areas. The first area was concerned with academic facts inthe areas of information science. All participants had either priorexperience in the academic field or were enrolled in an informationscience course. The second area was concerned with current events.The most topical of current events, at the time of the investigation,was the Watergate affair. It was being presented daily on televisionand in the news media. It was thought that all subjects would beknowledgeable in this area; however, this was not the case. A greaterpercentage of incorrect responses was made on the Watergate textualmaterial than on the information science textual material.

Ex erimental Subjects

A total of 239 subjects were available from the population ofgraduate students in the College of Education. An analysis of the

*Plan-Test Associates, Financial Center, Suite 609, 3443 NorthCentral Avenue, Phoenix, Arizona 85012.

15

Page 19: DOCUMENT RESUME MD 095 900 Kauffman, Dan; And Others · DOCUMENT RESUME. MD 095 900 IR 001 082. AUTHbr Kauffman, Dan; And Others TITLE The Empirical Derivation of Equations for Predic'cing.

GOPY WAKE

subjects' academic background indicated that all participants werefunctioning within the normal range of"intelligence. All subjectsused in this investigation were enrolled in College of Educationgraduate classes in an uncontrolled manner typical of college enroll-ment. It Is assumed that all subjects used in this investigation arerepresentative, in terms of language skills, of the population ofcollege graduates whose primary language, is English.

Out of the 239 subjects available to participate in this inves-tigation 118 were selected. These 118 subjects were divided intothree groups based upon their academic, background.

Croup 1: 25 subjectsA test group to validate the computer programs andselection of the textual information.' The dataderived from these subjects was not used in thefinal determination of the equation.

Group 2: 59 subjects

An experimental group of subjects who were exposedto the two types of textual materials. Group 2subjects were of varied academic background andtherefore considered to be less reliable in termsof their usefulness as a homogeneous group. Thetextual material presented to this group consistedof both academic materials (Information Science)and current material (Watergate affair). The data :

generated by Group 2 was suspect because of numer-.ous computer failures throughout the, experimentalperiod and was therefore rejected.

Group 3: 34 subjects

An experimental group of subjects who were equiva-lent in terms of academic background in the fieldof information science. All of the Group 3 sub-jects were concurrently enrolled in an informationscience course at the time of the study. Thetextual material presented to this group wasdirectly concerned with previously learned mater-ial and material they were about to study in theirclass. This group provided the most reliable datafor the final determination of the derived equation.

16

Page 20: DOCUMENT RESUME MD 095 900 Kauffman, Dan; And Others · DOCUMENT RESUME. MD 095 900 IR 001 082. AUTHbr Kauffman, Dan; And Others TITLE The Empirical Derivation of Equations for Predic'cing.

BEST COPY AVAILABLE

PHASE III

Calculation of Subjective Information

After all subjects had completed the guessing toquinces, ProgramCYBER2 processed the "score file," calculated the percentage of incor-rect responses and computed the subjective,'textual information foreach subject according to equation. (1):

H(text) = E Nr (r .1d (r) - (r-1) ld(r-1)) (1)

Table 2 presents the results derived, from test Group 3 as calcu-lated By Program CYBER2. This table shows the percentage of incorrectresponses and the corresponding information content for each of thesubjects. The table indicates that, in general, as the percentage ofincorrect responses increases the amount of information increases.This Aends to confirm the theoretical predictions.

PHASE IV

Determination of Equations

After all subjects had completed the guessing sequences, programCYBER2 processed the "Score File," calculated the percentage of incor-rect responses and computed the subjective textual information for eachsubject according to equation (1). This provided the necessary infor-mation for EQUFIT to process.

Program EQUFIT'provides a least squares curve fit for the datagenerated in Phase III by program CYBER2. The six curve types andresults are displayed in Table 3. The index of determinatior indicatesthat equatibns 6, 3 and 1, respectively, provide a fit that is "qUitegood."

Examination of Table 3 would tend to indicate that equation (6)is the best fit. Although this equation shows the highest Index ofDetermination, the other two equationl are so close that it is difficultto judge which is the best fitting equation using only the Index ofDetermination.

Ope would prefer the linear function because of its ease ofusage, but only if it proved to be "as good a fit" as the other.func-tions. The millatiOn derived by Weltner (1967) for the German,language

17

Page 21: DOCUMENT RESUME MD 095 900 Kauffman, Dan; And Others · DOCUMENT RESUME. MD 095 900 IR 001 082. AUTHbr Kauffman, Dan; And Others TITLE The Empirical Derivation of Equations for Predic'cing.

. Table 2

X AND Y COORDINATES FOR GROUP 3

Percent Incorrect Bits /Sign

x( 1) = 7.6923 Y( 1) .2281x( 2) 4. 10.0000 Y( 2) . .2664x( 3) = 10.0000 Y( 3) a ..2897x( 4) 12.5000 Y( 4Y as 3255x( 5) - 15.0000 Y( 5) / .

x( 6) - 15.0000..4942

6) .5328x( 7) = 15.3846 Y( 7) . .4238x( 8) = 15.3846 Y( 8) , 3077x( 9) = 17.5000 Y( 9) .4821x( 10) ' 17.5000 Y( 10) mi .4724x( 11) = 20.0000 Y( 11) a. .5407x( 12) = 20.0000 Y( 12) 1. .6020x( 13) = 20.0000 `I( 13) .5264x( 14)

x( 15)

= 20.0000= 20.0000

Y(

Y(14) a .5286,

15) so' :5627x( 16) = 20.0000 `I( 16) .5698x( 17) = 21.4286 Y( 17) .5364X( 18) = 22.5000 Y( 18) .6406x( 11) = 27.2727 X( 19y - .6918x( 20) = 27.5000 Y( 20) 1. .9640x( 21) = 27.5000 Y( 21) = 1.0123x( 22) = 27.5000 Y( 22) 9593x( 23) = 27.5000. Y( 23) .8112x( 24) = 32.5000 Y( 24) , .9116X( 25) = 35.0000 25) 1. 1.2194X( 26) m' 35.0000 Y( 26) = 1.2905x( 27) = 35.0000 Y( 27) a 1.2210.X( 28) = 37.5000 Y( 28) = .7500X( 29) = 37.5000 Y( 29) = 1.2191X( 30) = 38.4615 30) = 1.1554X( 31) - 40.0000 Y( 31) = 1:2366x( 32) 42.5000 32) = 1.3555x( 33) = 45.4545 Y( 33) = 1.2501x( 34) 56.2500 Y( 34) mi 1.5010

18

Page 22: DOCUMENT RESUME MD 095 900 Kauffman, Dan; And Others · DOCUMENT RESUME. MD 095 900 IR 001 082. AUTHbr Kauffman, Dan; And Others TITLE The Empirical Derivation of Equations for Predic'cing.

11W

BEST COPY AVAILABLE

1

TABLE3

LEAST SQUARES CURVE FIT

EQUFIT PROGRAM

In ea7cir

Curve Type Determination A

1. Y=A+(B*X) .892389' -1.49193E-02 3.02491E-02

2. Y=A*EXP(B*X) .851611 .229436 4.19008E -02

3. Y=A*(X411) .922853 2.41865E-02 1.05825

4. Y=A+(B/X). .693833 1.35166 -12.2042

5. Y=1/(A+B*X) .711047 3.48035 - 6.92674E-02

6. Y=X/(A+B*X) .930395 36.2543 - 5.09608E-02

19

Page 23: DOCUMENT RESUME MD 095 900 Kauffman, Dan; And Others · DOCUMENT RESUME. MD 095 900 IR 001 082. AUTHbr Kauffman, Dan; And Others TITLE The Empirical Derivation of Equations for Predic'cing.

.

BEST COPY AVAILABLE

is of a linear nature (see Figure 2). This would lead us to suspect. that the English language equation might Also exhibit similar linear'properties. Therefore, it was decided, at this point, to further ana-lyze the Group. 3 data. This is done in Phase V. .

PHASE V

Selection of "Best- Fitting" Equation.

Phase V, in part, utilized the progrim PTAFIT to further analyzethe equations to determine which would be the most appropriati todescribe the empirical data.

Table 4 shows the leastcurves fit generated by program PTAFIT.All equations.are shown to be significant at the 99% level of confi-dence.

TABLE 4

PTAFIT LEAST SQUARES CURVE FIT

Curve Type Index of

DeterminationA B Significance

90% 95% 99%

V 1*k+(B*X) .8924 -1.494E-2 3.025E-2 * * A

2 Y=A*EXP(B*X) .8517 .2294 .0419 * * . *

3 Y=A*(XtB) .9219 2.418E-2 1.058 * * *

4 Y=A+(B/x) .6918 1.353 -12.2 * * *

5 y=1/(A+Px) .7ri1 3.48 - 6.927E-2 * * *

6 Y=X/(A+B*A, .9304 36.26 - 5.097E-2 * * *

Number of Observations: 34

Equations 6, 3 and 1 again exhibit the highest Index of Deter-mination and therefore are prime candidates for further examination.Tables 5 and 6 show the standard deviation and confidence limits forthe coefficients of equations 6 and 1.

20

Page 24: DOCUMENT RESUME MD 095 900 Kauffman, Dan; And Others · DOCUMENT RESUME. MD 095 900 IR 001 082. AUTHbr Kauffman, Dan; And Others TITLE The Empirical Derivation of Equations for Predic'cing.

4.ZZ

3.EZ-

3.ZZ-

2.ZZ

PLOT OF: Y(E) = .03 IX - .031

(ENSLI51-1)

Y: G) = .1,4312X - .EBE

CEERMN)

ENSLIEH EURTiLN nY: KRIFFrEn

EE BEN

ERSTION EY: NELTNER

=r

rze

riin

LC

N OF LNG:PRP-CT RESPONSES

Figure 2.--Plot of English and German equations.

c3

/ Yu).-

.

1!

YCE)

Page 25: DOCUMENT RESUME MD 095 900 Kauffman, Dan; And Others · DOCUMENT RESUME. MD 095 900 IR 001 082. AUTHbr Kauffman, Dan; And Others TITLE The Empirical Derivation of Equations for Predic'cing.

TABLE 5

EQUATION 'TYPE 6

Y=X/(A + B*X)

or.

BEST COPY AMIABLE

Coefficient Value Standard 95%Deviation Confidence Limits

A 36.26 ''1.75 32.68 39.83

B - 5409E-2 9.52E-2 - .24 .14

TABLE 6

EQUATION TYPE 1.

Y=A+(B*X)

Coefficient Value Standard 95%Deviation. Confidence Limits

A -1.49E-2 5.20E-2 .12 9.11E-2

B 3.02E-2 1.85E-3 2.64E-2 3.40E-2

22

Page 26: DOCUMENT RESUME MD 095 900 Kauffman, Dan; And Others · DOCUMENT RESUME. MD 095 900 IR 001 082. AUTHbr Kauffman, Dan; And Others TITLE The Empirical Derivation of Equations for Predic'cing.

Bra copy

Tab:3 7 shows the PTAFIT output comparing the fit Of all pairsof equations. It shows that equation 6 does not.fit significantlybetter,than equation 1. Therefore, since equation 1 represents astraight line it will be given special consideration inviewof itsconvenience of interpretation.

, TABLE 7

COMPARISON OF INDICES OF'DETERMINATION

95". CONFIDENCE

2 :3 '5 E.

A_31..2

4

99% CONFIDENCE

CURVE' NO.1 2 3 4 5 6Ob.*

. 0.C WOO. 0

4

r ALTERISKS INDICATE PAIR:: OF CURVES RRE'THE(NOT S: "31 TO HAVE SIGNIFICAOTLY DIFFERENT INDICU!DETERMINATION.AT THE STATED CONFIDENCE LEVEL:).

=" 11 me.pl`I 8 and 9 show further output from the PTAFIT program. Theint with the coordinates of 37.5 and)0.75 appeared suspiciously dev-'

iAnt and an outlier test run on the residuals confirms this suspicion.,The outlier test was run using a Plan-Test Associates program based ona method reported by Chazal (1967). Thegraphical rand tabulaoutputf om this'program are shown in Tables 10 and 11, respectively.

In view of thii, PTAFIT was rerun omitting the outlier. Theresulting outputs are shown in Tables 11 through 17. We'now find thatequation 3 fits better than equation 6 (see Table 12), but still neither'is significantly better than equation'l (see. Table 15).

23

Page 27: DOCUMENT RESUME MD 095 900 Kauffman, Dan; And Others · DOCUMENT RESUME. MD 095 900 IR 001 082. AUTHbr Kauffman, Dan; And Others TITLE The Empirical Derivation of Equations for Predic'cing.

1 ...... t

it -C; -i. ..:. .J.) C. 1, '. f.....1 1...? (....., C.? i'' r': f ,) IN) ro roro ro r...' ro ro r....o rt) "..: :s,Cr, 0 r0 .= C.) '' 1 i -.-.1 Ln i_n LC ro -.) -.4 -.I -.4 -.4 ro -- .-..-. -4 -,i e_n ...n e_n i_n ro c-. c.

i . . c...-......-.....r.:_ '_n . 4 e _n e_n i_n e_r. '_fl .;_n e..1 r. til -: ,r fn io.,. f_n - .1 -,i_n Ln ..

1

-4 ,..; CO e_n ro .7.1i

,10 LnCT

. 11. ..-

ro

17, O-CCt 4. 9--j tfi 6.f.

'4) 1:' f: CT, er Cr 121 Crl 1_01 -: e_n ro roCr} Et t:r. -,1 rt: Cf:. .

)- .4) .- o) r.... ro rt. e_n .4) C7,r- 06 .. CC, CC, ro -: --

4b,

-'7 f0 '.:-, ' 1:1 *..:, '4:1 '.f.' 1.: '-'6 '''' (..11 1. ,...) 1.0 Cr, CC, CC ''' I

''''' 4- 4. 4-. .... Cr, C; c*, 17, -,..n io.-..--..-. .-..--..-. 46 -: C.:, ..f., CO C.: ' -,1 ---.1

4) -: -;.. 41. I-0 ..1:. ..1) ..1:1 -.L. - -2 (.0 ' 4. W 4: 011'...) rij Lr 0 -4 Itr-

I t 1 1 1 1 1 1 1 1 1111111 1 1

- CO Cr% (0 CO (0 r0 1'0 0.- (0 ro CO (0 4. 0) CN (0 ro -4- .7, co f..! CO -4 -A -U ck n) N

,c7.. . 0) . cr.. -4 .--. cr. N cr,

c.. ..1) 17, f_;t -4 CO it ,7 a J = n

Cr% - r..: CC4 CC, CT, CO o) .4) .4) ro 4 -A 4. C.) CO

' CO CO T., -.4 -4 -4 -4 o W 0 fj.) ej)

j '7.1,Y, 0) OD 0) .7% T. Cr. CIN _Tij I,,iv Ii W 0 '.1) ti, :0

cD n! (J! fJJ o LI o o (0 70 ri!e!) .4.) 0) 0) CO 0) .4 0 C. 0 .4 Cr.. Cr. (7% cr. Cr% CO tit 6-,

e_n

CNnj

.

4- nJ-1 o- ! -

(.! 6--

.1

RI ro LI

ro 4 -4- CO ':c CO cr.M M M1 1 i

ro rCi .4

.4) It CO 0) CO CO .)) CO =1 j J J Cr. Cr. Cr.. e_fl=. .76 =. - 4.. 4. 4. 4. Cr, Cr, =. 4 IX* :3

cr, 4. 4: 4: 4: -.J- - t_n io '_n CO 0) C0 .4) o o ro

Page 28: DOCUMENT RESUME MD 095 900 Kauffman, Dan; And Others · DOCUMENT RESUME. MD 095 900 IR 001 082. AUTHbr Kauffman, Dan; And Others TITLE The Empirical Derivation of Equations for Predic'cing.

.

S.

o 4: 4. 41 : 0: tO 0) 0: o) ro n) n) ro n) n) ro ro ro ro'n) n)cr. .21 ro c. co -A -A o o n) -..j -A 7A -J -..j N. .--. -A -J.._n ro c.

ro 4. 0 4. o Ln o n) Ln 4: o o o)o 0 5t

- - - -. . . . -. ...C. oi 0 0 cr, ,21 Ln 4. 4. 0: 4. e_fi

ro *- r0 r0 r0 r 0 - =. CT: s_n -U 510 10 r0 =. Cr: CC. =r r. 0 ro.J1 .21 - - 4. .- I) Ct. r.) .4) ro ro ro 0., ro 4. .21 '

Ln *L. q) - - rV r: 0 0) cr. 41, cr, 41 -A 0) -A 4. P- -A 0) CQ P: .J1 -1 4. -

. . . . -A -A -.1 CT: CT, o !_ri -U 4: 4. 4. 4: t_.1 r,.,

r.7 iv. 1:0 .X CO .=. .7. .7.. CU. .7. %T.. 57. 1) 1) 1,1 r,_, Is., - -

t 41 CT". 1.1 - - LI '4:1 ..1.1 -A -A --.1 4. ' ry 1) fl (1e_r1 r ro ._n CT, CT: Cr, 1) - 0) CO LIT Is

II.- cr. co '_n rV

cr. -A 0) q)re CT. -

- .1)

o) ro nJ ro ro ro n) ro nj 4. .- c7.. -A Cr. 41 41 ro ro ry -A 4. Clr,co ro o) co .4) cr. Clr. '

,.1:1 Cs) 4. co nj -A 0) Cr'. .21 Ln n) o -ACT. c .11 n) .-. ..Ln o o) -A CY. Li' 41 4. CT. 4. CO c. CT. c. .1) o o -J 4.4 Cr. 0 u, -A .10 .4) 4. -1 ro cr. o) ro 4. o .7% L.) ro T. -A

rn

rce

cnis -A -A -A -A CT. CT. 1, ii iT cr, 00000 4: 4: 4 4 4. 4. 4. 0 0 n, n, n,

cc, ro .172 ...I) t . ro _n 01 s_n c., Lri ,j1 ..L't 4. 4. ljnJ ry ro o) co 0) co OD n) -A -A -I -A -A *...J o) ro o ro nJo o .r, 4. ro -A -A -A -A -A -A -01.1 -A -A o nj

1,m1"1

.- 4. tt,t oj nj ro ro n) ro N.. r-T. "' '.1.:. CO sic, 0) 0) 0) 0) .-.7., Cr* o o o .21 .4. W f.o'ro ..

=1...1% .7, .=-.. -1 o o .- .- -- o) %.) oi ..o (.o V., Cf. 'is .:::. C. C. C. 2. C. f.:%, Cr. Cf. CT'. 4. 4. 0 0 0 4.CC, 4. ....C. Or. 0 ..o co o) 0)-0) Ln .t. 4.:. 4. 4.:. ro 4. 4. -A -.1 -J -A -4 -A CC, 0: .-. C. ,..., 0 ro .- .- .r.. ..0 .... .. .r. -4 oj r.o -A --,j'-A .i, C7. Cr. Cr'. Cr. Cr, ..- .7, ro ro ro ro ro ro .- .1-- o) 4.;., I.:1) 1:0 ,- P--' P-- P-t -4

, . r , i....). .. ., .

Page 29: DOCUMENT RESUME MD 095 900 Kauffman, Dan; And Others · DOCUMENT RESUME. MD 095 900 IR 001 082. AUTHbr Kauffman, Dan; And Others TITLE The Empirical Derivation of Equations for Predic'cing.

t. Or to woo

SIABLE 10

00.1ER TEST USING CHAZALIS METHOD

SYMBOL DEFINITION:

=

: =

=+ =

MEANGPANDLOUEPUPPER

OF .17rimpLt DATAMEAN OF DATAOUTLIEP LIMITOUTLIER LIMIT

-.36

k.EXCLUDING

-.1E 0.04 0.24 0.44. 0.64...+...,+

S 0.0+-A - +M - - *: +P - - :

-4:

L 4 : +E 5.0+ - : +

_: +

I - : +D - -

: +_

: +10.0+ - .

: +- : +- : +- _ +- _ +

15.0+ - -* +_

: +_ _ +_ _

: +- +

20 . o+ . _ - : +_ +_.

- ..- +.- _ +- _

: +25.0+ - +

- - +_

1).+

_ +

t- _ +

30.0+ : +_._ _ :* +

.

_v.

:

:

++

- - : +

INCFEMENTINCFEMENT

+ . + + + p + .4- a + I 4..36 .16. 0.04 ' 0 .24 .0.44

OF HOPIZONTAL :[ALE:OF VEPTICHL :CHLE:

2.00E-02.1.00

HO. OF POINT: PLOT1ED OH VEPTIf_AL :CALE: 34

26

.+ . . . . +0%64

Page 30: DOCUMENT RESUME MD 095 900 Kauffman, Dan; And Others · DOCUMENT RESUME. MD 095 900 IR 001 082. AUTHbr Kauffman, Dan; And Others TITLE The Empirical Derivation of Equations for Predic'cing.

=M en1,-.1 rin 73

-4 -4 1C,1C g- =- momm_

-- m0 m

0 0 1C1En ti P-4 2

.7. ..--. ..--, lic) alc) a a. c)ic) c) c)11 1 1 1 liallillic) c)lic) CD -,

r rn rn rn OOOOOOOOOOOOOOO OO OO rnc ._., _... .-- .-- a. .r... a. i o .-- ro ,-- b- CI b- b- 0- .7... 0- C. C..' ..- C- C. ..7.. Co C. Cot C. b- 07.7.. ..7... CP C.' ..- .7....' C. il

1 = 1 C . t 1 I . ! 0 b..' C O 4 . c . 1 7 ` . c , 4 . 1 4 c o 4 i ) 4 0- r C I '.J O r e ) 1 1 7 * C r . 0- ..V. ro CO 4 4 roi CA '4) CO ro r .-- Zm r r r ON cDILI1 ro eN ND a -J L0 -J 4) Pa L0,-J ON OD 0 -4 -4 b-- 4. no ND C5 ro ro 0 ON Ui 4: CO - 0 C5ti e= IC I

ICI U IGI

= 3 3 -3,

eri en cn....

-71IP 0 0 0 1

I =i t L_ 8--

0 1 1 1r r r---... 1141 IIO 1141 1111.1.111IIIIIIVIIIIIIIIIIIIIIIIIII.....0 rn M rrl . diem. 0. 1313 1r3 ro ro N no ro ro ro rea no ro r0 no r0 r0 no no no no no no r0 no ro no no r0 Pa r0 r0 no Pa ro no ro r-

0 0 0 Ca OCa Ca Ca ) Ca C) CO CO 0 Ca Ca CO Ca (a Ca 6) Ca CO 6) C Ca Ca CO CO Ca Ca CO Oa Ca C Ca C -4, _ . - e - J -4-J A -A A -J -4 - 4 -J -4 -4 -A -4 -4 -4 -A -A -A -.1 -4 -A -4 -A -4 -A -4 -A -4 -4 -A -J -A -J t-

o-.

0 II II rin

13

,

r-"-aZ

... .... .... ..... .... .... ....P .... q--. .. a...A 1 Co ...r ao .... ,.. a, M C. C. Cr C. Co M ao C. C. .=, I. 0ru ro ro rU ro ru no no ro Pa r0 ro no Pa no no no no ro no no no r0 PO no r0 PO ro no ria Pa no no PiCT: O.. es. (r. (r. O: ON LT: Os. 17. 0., sT. 87. ON cr: O: (r. ON O: 0. ON o. os. cr. IT. CIN ON O: a: o ON Cr: ON a:ru ro no ru ro ro no r0 r PO PO no Pa r0 r0 PO rea PO riO no Pa no Pa Pa no Pa PO PO Pa no ro ro ro Pa

Page 31: DOCUMENT RESUME MD 095 900 Kauffman, Dan; And Others · DOCUMENT RESUME. MD 095 900 IR 001 082. AUTHbr Kauffman, Dan; And Others TITLE The Empirical Derivation of Equations for Predic'cing.

BEST COPY AVAILABLE

TABLE 12

LEAST SQUARES CURVE FIT WITH OUTLIER REMOVED

CUPVE TYPE. OFDETEPHIMATIOH

A !TAGN1FICANCE90% 5% 9 %

1 YL:A+,1*.:, .9249 -3.081E-J: :.I LE 4

Y=A*::t1:.. .9418 2.,?44E-2 1.08t.

4 Y=.--A+,1-: 1.3755 yr.i..:A4TC-:, .71T1 3.5 -7.057T-26 36.71

NUMUR OF OF.:=EFNATIONC.: 33

...,... 4.

Page 32: DOCUMENT RESUME MD 095 900 Kauffman, Dan; And Others · DOCUMENT RESUME. MD 095 900 IR 001 082. AUTHbr Kauffman, Dan; And Others TITLE The Empirical Derivation of Equations for Predic'cing.

TABLE 13

nthC°11401/1Aiii.

EQUATION TYPE 3

Yg.A*(XtB)

LEAST SQUARES CURVE FIT WITH OUTLIER REMOVED

COEFFICIENT VALUE STANDARDDEVIATION

95 %CONFIDENCE LIMITS

I A: 2.244E-2 1.166 1.642E--2 3.067E-2

E: 1.086 4..847E-2 .9872 1.105

29

Page 33: DOCUMENT RESUME MD 095 900 Kauffman, Dan; And Others · DOCUMENT RESUME. MD 095 900 IR 001 082. AUTHbr Kauffman, Dan; And Others TITLE The Empirical Derivation of Equations for Predic'cing.

BEST COPY AVAILABLE

TABLE 14 /

EQUATION TYPE 1

Y=A+(3*X)

LEAST SQUARES CURVE FIT WITH OUTLIER REMOVED

COEFFICIENT VALUE :'.TANDAPD . 95CONFIDENCELIMIT

A: -3.01E-2 4.442E-2 -.1214 5.978E-2

E: 3.132E-2 1.603E-3 205E-2 3.459E-2

UTIMATED POPULATILN 'OTANDAPD DEVIATIONABOUT PEGFESSION LINE = .1041

c.

30

Page 34: DOCUMENT RESUME MD 095 900 Kauffman, Dan; And Others · DOCUMENT RESUME. MD 095 900 IR 001 082. AUTHbr Kauffman, Dan; And Others TITLE The Empirical Derivation of Equations for Predic'cing.

BEST COPY AVAILABLE

TABLE 15

COMPARISON OF INDICES OF DETERMINATION

WITH OUTLIER REMOVED

95 CONFIDENCE 9 CONFIDENCE

CURVE NO. . CURVE NO.1 2 .:. 4 5 l'-'. 1 2 3 4 5 6

3 4 ' 3

6 6

.1 1

,... 25 5 ....

,..,

4 '4

gTERISKS INDICATE PAIRS OF CUP.VES ARE THE 'SAME'.(NOT SHOWN TO HAVE SIGNIFICANTLY DIFFERENT INDICES OF'DETERMINATION AT THE STATED CONFIDENCE LEVELS).

31

Page 35: DOCUMENT RESUME MD 095 900 Kauffman, Dan; And Others · DOCUMENT RESUME. MD 095 900 IR 001 082. AUTHbr Kauffman, Dan; And Others TITLE The Empirical Derivation of Equations for Predic'cing.

c_n

i. 4-

4-

L.:.

C.:.

0fy

ft)

nj 1

".ry

-1

-.1

'4 -

1 -.

1-1

-1

rL.fl

4- L

t1Ln o LI o

0 4-

oo 0) o

ry ii

Cf:

.21

cr.

o Pi

rr

Pi

Ln

P-s.

..,-j

- T

.

-.7

. .7.

t_n

4- 4

-f_

nry

n.1

.-.1

7.=

g17

.I')

C.:,

cr.

4- -

I).c

..7

.I"

... .c

.-.

1 L.

)4-

.1.,

*7. C

.)CC.

C`.

4-

CT

. 4-

-.4

-1 4

--

CO

CC

,I"

,01

-1

4- -

-

.i. co 0) co ) o) cr. cr. o ...n o Lno n 01 ._n

4. 4

. 4. 4

. 0 n

j r.

nj-:

-A1'

0 -

.... C

..7

...7:

. CO

.....

......

'IQ...

: n:

-4 4- .i. 'ID 'ID

..L.) *1) .1) -- .--._ri o o o cr. co .)-.. -.

1

C.:.

1)I,.

.,-

J 4-

Cr.

.7. C

r. -

J.7

.....z

. c. c

f....

,i..

..., c

. ..4

,...t1

Lr.

Ls

Ln o

- 1

-.4

c.'4

7.,

*17.

,.c

. n.,

I",.,

c'7

..-

- ,

ry 4

. 4. v

t 0 0

7 .-

- ,..

n n

Ln o

... 1

) 4-

CT

. Cr.

.7.

Cr.

CY

. CT

. CO

'....

.*i

..-1

:U 4

- .-

-I r '7

1

II

II

I1

11

11

1111

III

I1

R.

-.j

r,.,

c_n

CC

, cr.

n:.1

) n:

0)Cr.

ocr.

..- .

..

pjLn

Ln

..

yj.

C.D

f_n cr. o)

co

-4 cn cr. o

h-1

r*:.,

CC

'C

O 4

- C

Y.

Ln17

' 4. P

.,f.T

. -1

P.,

-.1

C. -

4C

.1C

O R

.14 T

I4.

co

CO

C.:.

.7. C

C.

- 1

-..n

'7. -

1C

C,

-1 I.

.? 4

-11

P"

.1.)

-%1

CY

. Cr.

CT

. .7.

'7. 4

.f2

:1ro

4- -

1 -

1 -1

4. 4

.-

4-,1

)1)

4.)

Ln 01 01

n: 0

) co

0)

0) c

r.. -

-L.

: L.:

0ry

.1)

0) ,)

)

m P.1

6-4 0 r

CO

-.-

1--

1 C

Y. '

7'. C

Y.

.7..

1,1.

01 i_

n 4-

14

,1".

.'C

r'.-

11.r

.i.'

..,...

.)r.

..) n

: n: r

u -

.--

- -

.-.

.7.f:

t...

n.

0 i.0

.*.r

.er

. ...n

o 0

) ..z

. .r,

L.:

3er

. .--

n..

4- *

f.,C

r. C

O C

.:, C

C, .

c. 4

- 4-

4-

4- ..

.).1

7.C

r.4- l CO C0 -.1

-.1

.:.:.

....)

i....1

0)

:-..CO ,..0

%..:, 4- ,...,

..-..,

,.0 f..: 4-

.7. C

T. C

Y.

..7.

*i..

- 1

....

.r.,

*:,

'4.:,

'4.:.

.1..

*i.,

4- 4

- i..

n: n

: f.:

:I *-

-1 n

j-4 c...'

..

rn

Page 36: DOCUMENT RESUME MD 095 900 Kauffman, Dan; And Others · DOCUMENT RESUME. MD 095 900 IR 001 082. AUTHbr Kauffman, Dan; And Others TITLE The Empirical Derivation of Equations for Predic'cing.

1) CO CO CO CO

4. 4. 41 C.! C.: C.) CO 1.0 ry ry n2 n2 n2 n2 N N n2 n2 ro n2 n2 -4-1 -4 0 0 ro c.

R.ti s 121 Ul 1%1 t-.11 0 0 (.2 .11 .4) :7

0 ...A Cr. -A W co n2 C

-CO .4:.' Cr. CT. III i-r! fr: Cr. 1:11 ill !A 4. Crl C..' rr r-Q

1%. C.2 r,.. r.., r.) r,., g 4* Cr. rt ..:1 r.2 =...... ," rf: L- - - =. Cr. CO Cr. r.2 ry -A 4. CI CO- Cr. -.1 f-f! 1. - cr. n., ro ..7 4. Cr. CO -A 4. - 1 CO CO

:1 Cr. Cr. 01 Cl (21 f_fl 4. 4. 4. 4. r..,

-1 4. v., - ," =' C' r.) 11- - N CO CO 0 CO CO OD n., ron2 TJj7jt co f.2

cr. -4 f.) .1) f. - 1:. .1) ...C. 0 0 -1 .» cr. cr, cr, co r-

I !- N N - - P.:, - - - - Cr - 4*. f"... 1'0 - cq4. 4. Cr. 4* 4. ,C1 CT oz.

- r . . . . . . .1... .11 Cr. I:, ," Cr, CO Cr' IIn., 4. CO CO LT L-J C.., 0-. I-0 ,..f7 4. ry cc,

cn CT. 1

J

.4) .4) 02 co co co co -1 cr. Cr, 17 Cr. .7'. 0 4* 4* 4- 4- 4. 0 CO C.:, CO CONNN-O .! C. .1:0 'L. -.10.... ry n2 n2 ro ry -J 4. 4. 4. 4. 4. CO CO 6.) ro n2 CF..

-.1 CO 4. ro ro c. -4 -A - 4. CO '0 4. Cr. Cr. 4.n2 CO 0) Cr. CP. .7. 4+ CIN o) 0) o) 0) o) co .- co 0) co cr. cr, "r1

rn

t7o

N ro) co 1 --.1 1 -.1 -.1 -.1 CT, Cr ,0 CY! f_r! 0.1 4. O.:. ;_:.. I-0 1.4

-a-..

..1 !Y. ...1, N ,-r: - '1:. -A -.1 -.1 -A O'. 1... D .C. C. C. C. C. C. 0..! s 1..., ,". -A -.1 ,1:* ro rQ 4* r...,

kr, 01 CO 4. - .... 0 0 0 0 ...n -A n2 n2 ry ry n2 n2 n2 ro n2 . co oo o) .- cv ,...v co -.-.4

a

Page 37: DOCUMENT RESUME MD 095 900 Kauffman, Dan; And Others · DOCUMENT RESUME. MD 095 900 IR 001 082. AUTHbr Kauffman, Dan; And Others TITLE The Empirical Derivation of Equations for Predic'cing.

BESTSO?! AVM ABLE

The English Equation

. Our straight line equation, rounded to three decimal places,from Table 14 becomes

0 41*

H(text)E = .031X - .031 (2).

. Where: H(text)E= subjective textual information per text--English

. And: 'X = percentage of incorrect responses

It is interesting to note, however, that the -.031 intercept could wellbe zero for the population since its 95% confidenCe band spans thatvalue (see Table 14)4) This zero value is to be expected since thetheoretical information content should be zero for zero incorrectresponses. In other words, if a person does not make a response errorhe is predicting the teXt with certainty and, therefore, the text con-tains zero information for him. The English equation closely approxi-mates this. Also the slope could (at the 95Vlevel of confidence) spanfrom 0.028 to .0345.

The German Equation

The equation derived by Weltner for the German language ins:

H(text)G = .039X - .080 (3)

Where: H(text)G= subjective textual information per text--German

'And: X = percentage of incorrect responses

We have no measure of the confidence bands on the German data,but the agreement between the equations is quite remarkable. See thegraphs in Figure 2 (page 21..

One additional point in regard to the residuals is the extremeruns observed whiCh might suggest a somewhat better fit with higherdegree polynomials. This has not been pursued in this study.

34

Page 38: DOCUMENT RESUME MD 095 900 Kauffman, Dan; And Others · DOCUMENT RESUME. MD 095 900 IR 001 082. AUTHbr Kauffman, Dan; And Others TITLE The Empirical Derivation of Equations for Predic'cing.

It is ,nteresting to note that the equation can be put into a

more useful form. The derivation is as follows:

Since:

which can be represented by E/N 100

Where E = number of wrongly guessed signs

And N text length

Then

H(text)E = .031X - .031

X = percentage of incorrect responses

M H(text) = .031[E/N 100] - .031

H(text) = 3.1E/N 7 .031

NH(text) = 3.1E .031N

Setting NH(text) = I, where I = information in bits we have

1

I = 3.1E - .031N

(2)

(4)

With the equation in this form we can now directly evaluate the Infor-mation content of a text of length N for a specific subject. In view

of our earlier evaluation of the confidence on the .03,1 coefficientwhich may well be zero, it may be concluded that our equation is inde-pendent of the text length N such that equation (4) at the 95% confi-dence level becomes

35

(5)

Page 39: DOCUMENT RESUME MD 095 900 Kauffman, Dan; And Others · DOCUMENT RESUME. MD 095 900 IR 001 082. AUTHbr Kauffman, Dan; And Others TITLE The Empirical Derivation of Equations for Predic'cing.

Conclusion

The application of Shannon's guessing procedure permits us tomeasure the sub'ective information of a given text for a specitic

--learner. Un ike senseless texts, the subjective information of a.meaningful text can vary from learner to learner. The derivation ofequation (2), equation (4) and equation (5) in this study now allowsa direct evaluation of the "meaningfulness" of a specific English textfora specific learner.

Educational theory tells.us that the interactive events betweena.learner and his environment have a direct influence upon.his learning.SinCe these interactive events can now be specified in terms of infor-mation theory, the specification of information-theory-basedcriteriafor the selection and completion of education goals is poiSible.

These equations now permit us to measure information in terms of.a value that is dependent upon the internal state of the learner. Wehave always known that the internal state of the learner is directlyrelated to his ability to process information in a meaningful manner.(learning). We have also known that subject matter has an inherentdifficulty factor. The degree of this difficulty factor plus theinternal state of the learner are direct variables that influence the.effectiveness of an instructional system. The equations derived, inthis study, now provide a quantitative measure of these factors. This,in turn, permits us to specify attainable goals for both the instruc-tional system and the learner. In practice, this can provide us withthe means to specify precisely the instructions needed by the learners.

To the educator this means that he cannot only derive a quanti-tative measure of information contained in an instruction sequence foreach learner, but he can use this measure to evaluate the effectivenessof his instruction.

36

Page 40: DOCUMENT RESUME MD 095 900 Kauffman, Dan; And Others · DOCUMENT RESUME. MD 095 900 IR 001 082. AUTHbr Kauffman, Dan; And Others TITLE The Empirical Derivation of Equations for Predic'cing.

REFERENCES

Ash, R. Information theory. New York: Interscience, 1965.

Chazal, M.' How to discard invalid measurements. Chemical Engineering,February, 1967. Pp. 182-184.

Edwards, E.

Frank, H.

Information transmission. London: Chapman' s Hall, 1964.

Kybernetische grundlagen der padagogik. Baden-Baden: AgisVerlag, 1962.

Fuchs, W. Cybernetics for the modern mind. New York: Macmillan, 197J.

Guilbaud, G. What is cybernetics? London:. William Heineman, 1959.

Khinchin, A. Ir Mathematical foundations of information theory. NewYork: Dover, 1957.

'Minsky, M. 'Semantic information processing.1968.

Cambridge: MIT Press,

A.

Singh, J. Great ideas in information theory, language and cybernetics.New York: Dover, 19't.

Shannon, C. The mathematical theory of communications.Technical Journal, 1948, 27, 279-425/623=656.

Shannon, C. Prediction and entropy of printed English.Technical Journal, January, 1951.

Shannon, C., & Weaver, W. The mathematical theory of communication.Chicago: University of Illinois Press, 1969.

Weltner, K. Zum ratetest nach Shannon. In H. Schmidt (Ed.),Grundlagenstudien aus kybernetik und geisteswissenschaft.Band 6. Quickborn Bei Hamburg: Schnelle Verlag, 1965.

Weitner, K. Der Shannonsche rateteSt in der praxis der programmierteninstruktion. Lehrmaschinen in kybernetischer urid padagogischersicht 4. Stuttgart: Ernst Klett Verlag, 1966. Pp. 40-$4.

ti

Weitner, K. Zur bestimmung der subjektiven information durch rate-tests. Praxis und perspektiven des programmierten unterricts.Band II. Bevensee, Quickborn, Germany: Verlag Schnelle, 1967.Pp. 69-74.

Bell System

Bell System

37

Page 41: DOCUMENT RESUME MD 095 900 Kauffman, Dan; And Others · DOCUMENT RESUME. MD 095 900 IR 001 082. AUTHbr Kauffman, Dan; And Others TITLE The Empirical Derivation of Equations for Predic'cing.

Weltner, K. Informationstheorie and erziehungswissenschaft. Quickborn:Verlag Schnelle, 1970.

38

at