CHILDES II 08-11-20061 CHILDES II (Child Language Data Exchange System) Jacqueline van Kampen...

CHILDES II 08-11-2006 1

CHILDES II (Child Language Data Exchange System)

Jacqueline van Kampen

http://www.let.uu.nl/~Jacqueline.vanKampen/personal/

http://www.let.uu.nl/~Jacqueline.vanKampen/personal







CHILDES II 08-11-2006 2

CHILDES Duscussion mailing lists

[email protected] is an Internet list which promotes general discussion of issues relating to child language learning. It is not moderated, and only subscribers can post to the list. There are currently over 1600 subscribers and there are about six or seven messages posted to the list each week.

[email protected] provides more technical discussions of the the CHILDES programs.

[email protected] is a discussion list for the Phon program.

All of these lists can be received as weekly digests. To subscribe to any of these lists, send a request to Mary MacWhinney at mary-at-cmu-dot-edu.

Website: http://childes.psy.cmu.edu

http://childes.psy.cmu.edu/

CHILDES II 08-11-2006 3

MOR lexicon in various languages

The MOR program provides a method for automatic tagging of corpora in the CHAT format. To make this work, separate MOR grammars have been constructed for each language.

There are working MOR grammars for Cantonese, Chinese, Dutch, English, French, German, Hebrew, Japanese, Italian and Spanish.

After analysis with MOR, users can then use the POST program to disambiguate the %mor line. There is a POST disambiguation database for English, but for other languages, users will need to do the work of training a POST database for themselves.

http://childes.psy.cmu.edu/morgrams/

We will not consider the MOR program here. In order to run it properly, you have to build in a lot of things and it costs some time.

http://childes.psy.cmu.edu/morgrams/

CHILDES II 08-11-2006 4

The MLU command and %mor line (NEW!)

MLU calculates the mean length of utterance

If the corpus has no %mor line, you then run MLU on the main line by adding the -t%mor switch, for example:

mlu +t*CHI *.cha mlu -t%mor

Then you get "MLU in words". MLU is going to count each word as one word and will do no morphemic analysis

If the corpus has a %mor line, then MLU will give you a true MLU in number of “morphemes”.

Note: There is a folder morsamples under LIB that contains some files with %mor tiers

If you want to have an MLU in morphemes, you have to create a %mor line

CHILDES II 08-11-2006 5

Talkbank

TalkBank is an interdisciplinary research project funded by a 5-year grant entitled "TalkBank: A Multimodal Database of Communicative Interactions" from the National Science Foundation (BCS-998009, KDI, SBE) to Carnegie Mellon University and the University of Pennsylvania. Additional support comes from NSF ITR Grant 0324883.

At: http://talkbank.org/

Talkbank supports digital videos and digital audios If you want to set up a digital corpus, you can find all

instructions to link your transcript to the video or the audio recording.

http://talkbank.org/



CHILDES II 08-11-2006 6

open CLAN (here at: Q:\VFS\CLAN\CHILDES\CLAN\LIB) go to Window Webdata double-click the Childes button select by double-clicking the directory and and then the file put cursor at the top of the file and type escape-8. (the escape

key, then release of the escape key, then the 8 key).

At: http://talkbank.org/tbviewer/

To stop, put cursor in the QuickTime window, click and close with control-W and then close the transcript with control-W.

troubles with opening are most likely due to the presence of a firewall

Simultaneous transcript and video (Lea’s question)

In order to browse a transcript and see the corresponding video you have to install QuickTime. The videos are at http://childes.psy.cmu.edu/media/

http://talkbank.org/tbviewer/





http://childes.psy.cmu.edu/media

CHILDES II 08-11-2006 7

Sign language – BTS

The Berkeley Transcription System

designed by Dan Slobin, UC Berkeley and Nini Hoiting, Royal Institute for the Deaf “H. D. Guyot,” Haren, Netherlands

Under Talkbank you will also find a guide to create video files for analysis with the CLAN editor, at http://talkbank.org/dv/

AT: http://childes.psy.cmu.edu/manuals/bts.pdf

http://talkbank.org/dv/

http://childes.psy.cmu.edu/manuals/bts.pdf










CHILDES II 08-11-2006 8

MLU, the +d option and Excel lists (Mechtild’s question)

+d This switch is used with MLU to create lists suitable for Excel. The command for the sample file is:mlu +d +tCHI sample.cha

In order to run this type of analysis, you must have an @ID header for each participant you wish to track.

You can use the +t switch in the form +tCHI to examine a whole collections of files. In this case, all of the *CHI lines will be examined in the corpus.

RUN: mlu +tCHI *.cha +f +d +u

Try this out for: 1) all files in the folder ne20 under the LIB directory2) merging all files together in one output file

CHILDES II 08-11-2006 9

MLU, the +d option and Excel lists (Mechtild’s question)

You will get this. The fields refer to

en|newengland|CHI|1;7.13|female|normal|.|Target_Child|.| 58 81 1.397 0.614en|newengland|CHI|1;9.7|male|normal|.|Target_Child|.| 57 64 1.123 0.328en|newengland|CHI|1;7.7|female|normal|.|Target_Child|.| 47 60 1.277 0.534 en|newengland|CHI|1;8.16|female|normal|.|Target_Child|.| 101 220 2.178 1.36 en|newengland|CHI|1;8.3|female|normal|.|Target_Child|.| 74 75 1.014 0.115

Open Excel and open the mlu file you just created

Select: delimited

Select: space

@ID field (language, corpus, file, age, participant id), number of utterances, number of morphemes, morphemes/utterances, standard deviation)

CHILDES II 08-11-2006 10

STATFREQ, the +d2 option and Excel lists

RUN: 1) freq +d2 +t*CHI *.cha and 2) statfreq stat.out.cex +f +d

If you want to count codes/words in several files and transport it to Excel to create tables, you use the STATFREQ program.

The STATFREQ program provides a way of producing a summary of word or code frequencies across a set of files.

Try this out again for: 1) all files in the folder ne20 under under the LIB directory

+d2 With this option, the output is sent to a file in a very specific form that is useful for input to STATFREQ.

This option creates a stat.out file to keep track of multiple .frq.cex output files.

You do not need to use the +f option with +d2, because this is assumed. You must include a +t specification in order to tell the +d2 option which

speaker to track for the STATFREQ analysis.

CHILDES II 08-11-2006 11

STATFREQ, the +d2 option and Excel lists

You will get this (and a much longer list): 1) the words used by the id (child) 2) in the specified files

3) the number of occurences

Open Excel and open the stat.out.sat.cex file you just created

Select: delimited

Select: space

age sex group SES role situation *CHI: a again ah all and animals apple

1;7.13 female normal . Target_Child . 247 1 1 0 0 1 1 0

1;9.7 male normal . Target_Child . 182 3 0 0 0 0 0 5




CHILDES II 08-11-2006 12

FREQ and the wildcard *

Question: what would be the command for the frequency of all the animate nouns (N:A) ?

The Output:1 n:a|baby-all 1 n:a|ball-acc 1 n:a|duck-acc

If you want to know the frequency of various codes that are given in the %tiers, you may use the wildcard (asterix *).

An example from the sample2.cha in the LIB directory (these are Hungarian data). The %mor line in this file includes the codes:N:A|duck-ACC, N:I|plane-ACC, N:I|grape-ALL, and N:A|baby-ALL

(ACC = accusative; I = illative; N:A = animate noun; N:I = inanimate noun).

RUN: freq +t%mor +s"N:A|*" sample2.cha

CHILDES II 08-11-2006 13

Note: the file is not a cha, but a cex file (in legible CHAT format), because it was created from the input .file by running the MOR command

RUN: kwal +t%mor +s"n:prop|*" +s"det|*" eve15short.mor.cex

KWAL and the wildcard *

How would you use the wildcard for the command KWAL:1) in the file eve15short.mor.cex under LIB\morsamples2) for all speakers3) the morphological codes a. proper name and b. determiner

CHILDES II 08-11-2006 14

The CHSTR command

+c option You can make many changes in the data by using a text editor to create an

ASCII text file containing a list of words to be changed and what they should be changed to. This file should conform to this format: " oldstring " " newstring "

If you surround the oldstring and the newstring with spaces, CHSTRING treats them like words. This means that they are matched if they are preceded or followed spaces or other delimiters.

The default name for the file listing the changes is changes.cut. If you don’t specify a file name at the +c option, the program searches for changes.cut.

CHSTRING changes one string to another string in an ASCII text file. It is useful when you want to correct spelling, or make other changes. It allows you to make a number of changes in several files at the same time.

CHILDES II 08-11-2006 15

The CHSTR command

Make: a changes.cut with " eat " " quark "

Run: chstring +c sample.cha

See how the original file sample.cha has changed

CHILDES II 08-11-2006 16

Tracking final words

RUN: combo +s "wh*^(\!+?+.)" +t*CHI +u *.cha

In order to find the final words of utterances, you need to use the complete delimiter set in your COMBO search string. You can do this with this syntax (\!+?+.) where the parentheses enclose a

set of alternative delimiters. In order to specify the single word that appears before these delimiters, you

can use the asterisk wildcard.

Try this out for: 1) all files in the folder ne20 under the LIB directory2) the question words what, who, where, which, why in

sentence final position

CHILDES II 08-11-2006 17

Tracking Initial Words (p. 68 in the CLAN Manual)

Because there is no special character that marks the beginnings of files, it is difficult to compose search strings to track items at utterance initial position. To solve this problem, you can run use CHSTRING to insert sentence initial markers. A good marker to use is the ++ symbol, which is only rarely used for other purposes. You can use this command:

chstring +c -t@ -t% +t* *.cha

You also need to have a file called changes.cut that has this one line:

": " ": ++"In this one-line file, there are two quoted strings. The first has a colon followed by a tab. The second has a colon followed by a tab and then a double plus.

Make a changes.cut file under \LIB

CHILDES II 08-11-2006 18

Initial qu* (Magda’s question)

And see what you get

Now try out: chstring +c -t@ -t% +t* 55.cha

CHSTRING changes the ": " but not in initial position!How come?

My guess: The colon after a tier will not be read as being part of the utterance

I have at the moment no answer to this question

CHILDES II 08-11-2006 19

Final qu* (Magda’s question)

RUN: combo +s"qu*^(\!+?+.)" *.cha +u +t*MOT

For French the question words are que, qui, quoi, quand, quel

Now use a COMBO command to extract all French questionwords in final position 1) in all files in the Champaud folder under the LIB directory2) for the mother only3) merging all files together

CHILDES II 08-11-2006 20

A COMBO search (Magda)

RUN: combo +spens*^*qu* +u *.cha

How would you do this when looking in the files in the Champaud folder 1) penser + question word 2) for all speakers3) merging all files together

Recall that COMBO searches for a combination of words combined with Boolean operators like "and" or "or"

Some operators used with combo

^ immediately followed by * repeated character+ OR! NOT

Exampleswant directly followed by to.combo +s " want^to " sample.chawant eventually followed by to.combo +s"want^*^to" sample.chaboth want and to in any order:combo +s"want^to" +x sample.cha

CHILDES II 08-11-2006 21

Piping possibilities

For example: kwal +swait* -swaiter +d adam17.cha won’t work

make a frequency list: *freq +sgo* adam17.cha

However, you can use the piping functionkwal +swait* +d | kwal -swaiter +d

If you want to extract all and only utterances with (for example) a specific verb and its conjugations, you might want to use the options +s and –sBUTIn kwal the +s and –s option cannot be used together

Try this out for: 1) all conjugations of the verb ‘go’; 2) in file adam17; 3) for

speaker = child

pipe the +sgo* command with all words to be excluded: kwal +sgo* +d +t*CHI adam17.cha | kwal -s"good" -s"got" +d

CHILDES II 08-11-2006 22

More piping possibilitiesHow would you code the following?

Desired Output: a frequency count of the words - want and to when either one of them is coded as an imitation on the %spa line- neat when it is a continuation on the %spa line

combo +s(want^to^*^%spa:^*^$INI*)+(neat^*^%spa:^*^$CON*) +t%spa +f +d sample.cha freq +swant +sto +sneat sample.cmb.cex

+s option specifies that the words want, to, and $INI may occur in any order on the selected tiers. Same with neat and

The +t%spa option must be added in order to allow the program to look at the %spa tier when searching for a match.

The +d option is used to specify that the information produced by the program, such as file name, line number and exact position of words on the tier, should be excluded from the output. This way the output is in a legal CHAT format and can be used as an input to another CLAN program, FREQ in this case. The same effect could also be obtained by using the piping feature.

HOW?

CHILDES II 08-11-2006 23

More on tracking a subset: the +z option

+z10w analyze the first ten words only.+z10u analyze the first ten utterances only.+z10w-20w analyze 11 words starting with the 10th word.ETC..

kwal +d +z1-3u +t*CHI sample.cha | kwal +sMommy

+z This option allows the user to select any range of words, utterances, or speaker turns to be analyzed. The range specifications should immediately follow the option.

Try this out for: 1) 0012.cha under LIB\ 2) speaker = child 3) the first 20 utterances

If the +z option is used together with the +s option, you should use piping

Try this out for: 1) sample.cha 2) speaker = child 3) the utterances 1-3 3) the word

Mommy

mlu +z20u +t*CHI sample.cha

CHILDES II 08-11-2006 24

depfile.cut in the /lib directory (CHILDES\CLAN\LIB\depfile)

The depfile distributed with CLAN lists the legitimate CHAT headers and dependent tier names as well as many of the strings allowed within the main line and the various dependent tiers.

When running CHECK, you should have the file called depfile.cut located in your LIB (=library) directory, which you set from the Commands window. If the programs cannot find the depfile, they will query you for its location.

Choose: LIB\ as your working directoryRun CHECK on the file kid10.cha

In the LIB directory kid10.cha has a large number of CHAT errors.

If CHECK finds errors, you have to fix them. See the CLAN manual, p. 51

CHILDES II 08-11-2006 25

CHIP and repetitions in conversation (Magda)

CHIP analyzes specified pairs of utterances. CHIP can be used to explore parentals input, the relation between speech

acts and imitation. When you use CHIP for data in a publication, you should should cite

Sokolov and MacWhinney (1990). There are four major aspects of CHIP (1) the tier creation system, (2) the

coding system, (3) the technique for defining substitution classes, and (4) the nature of the summary statistics.

See the CLAN Manual at pages 53-59

CHIP compares two specified utterances and produces an analysis that it then inserts onto a new coding tier.

The first utterance in the designated utterance pair is the “source” utterance and the second is the “response” utterance. The response is compared to the source.

Speakers are designated by the +b and +c

CHILDES II 08-11-2006 26

CHIP and repetitions in conversation

@Begin@Participants: MOT Mother, CHI Child*MOT: what-'is that?*CHI: hat.*MOT: a hat!*CHI: a hat.*MOT: and what-'is this?*CHI: that a hat!*MOT: yes that-'is the hat.@End

RUN: chip +bMOT +cCHI chip.cha RUN: chip +bMOT +cCHI –ns –nb chip.cha

There are four aspects of CHIP

1) the tier creation system 2) the coding system3) the technique for defining

substitution classes4) the nature of the summary

statistics.

Try CHIP out for: 1) chip.cha file 2) +bMOT and +cCHI

There is a chip.cha file under CLAN to try this command out

With the options –ns and –nb CHIP produces an analysis of the child’s responses to an adult utterance only. Try this out

CHILDES II 08-11-2006 27

An excercise

Try out the piping option of

More on my web http://www.let.uu.nl/~Jacqueline.vanKampen/personal/

combo +s(want^to^*^%spa:^*^$INI*)+(neat^*^%spa:^*^$CON*) +t%spa +f +d sample.cha freq +swant +sto +sneat sample.cmb.cex








CHILDES II 08-11-20061 CHILDES II (Child Language Data Exchange System) Jacqueline van Kampen...

Documents

Transcript of CHILDES II 08-11-20061 CHILDES II (Child Language Data Exchange System) Jacqueline van Kampen...