CHILDES II 08-11-20061 CHILDES II (Child Language Data Exchange System) Jacqueline van Kampen...
-
date post
21-Dec-2015 -
Category
Documents
-
view
216 -
download
2
Transcript of CHILDES II 08-11-20061 CHILDES II (Child Language Data Exchange System) Jacqueline van Kampen...
CHILDES II 08-11-2006 1
CHILDES II (Child Language Data Exchange System)
Jacqueline van Kampen
http://www.let.uu.nl/~Jacqueline.vanKampen/personal/
CHILDES II 08-11-2006 2
CHILDES Duscussion mailing lists
[email protected] is an Internet list which promotes general discussion of issues relating to child language learning. It is not moderated, and only subscribers can post to the list. There are currently over 1600 subscribers and there are about six or seven messages posted to the list each week.
[email protected] provides more technical discussions of the the CHILDES programs.
[email protected] is a discussion list for the Phon program.
All of these lists can be received as weekly digests. To subscribe to any of these lists, send a request to Mary MacWhinney at mary-at-cmu-dot-edu.
Website: http://childes.psy.cmu.edu
CHILDES II 08-11-2006 3
MOR lexicon in various languages
The MOR program provides a method for automatic tagging of corpora in the CHAT format. To make this work, separate MOR grammars have been constructed for each language.
There are working MOR grammars for Cantonese, Chinese, Dutch, English, French, German, Hebrew, Japanese, Italian and Spanish.
After analysis with MOR, users can then use the POST program to disambiguate the %mor line. There is a POST disambiguation database for English, but for other languages, users will need to do the work of training a POST database for themselves.
http://childes.psy.cmu.edu/morgrams/
We will not consider the MOR program here. In order to run it properly, you have to build in a lot of things and it costs some time.
CHILDES II 08-11-2006 4
The MLU command and %mor line (NEW!)
MLU calculates the mean length of utterance
If the corpus has no %mor line, you then run MLU on the main line by adding the -t%mor switch, for example:
mlu +t*CHI *.cha mlu -t%mor
Then you get "MLU in words". MLU is going to count each word as one word and will do no morphemic analysis
If the corpus has a %mor line, then MLU will give you a true MLU in number of “morphemes”.
Note: There is a folder morsamples under LIB that contains some files with %mor tiers
If you want to have an MLU in morphemes, you have to create a %mor line
CHILDES II 08-11-2006 5
Talkbank
TalkBank is an interdisciplinary research project funded by a 5-year grant entitled "TalkBank: A Multimodal Database of Communicative Interactions" from the National Science Foundation (BCS-998009, KDI, SBE) to Carnegie Mellon University and the University of Pennsylvania. Additional support comes from NSF ITR Grant 0324883.
At: http://talkbank.org/
Talkbank supports digital videos and digital audios If you want to set up a digital corpus, you can find all
instructions to link your transcript to the video or the audio recording.
CHILDES II 08-11-2006 6
open CLAN (here at: Q:\VFS\CLAN\CHILDES\CLAN\LIB) go to Window Webdata double-click the Childes button select by double-clicking the directory and and then the file put cursor at the top of the file and type escape-8. (the escape
key, then release of the escape key, then the 8 key).
At: http://talkbank.org/tbviewer/
To stop, put cursor in the QuickTime window, click and close with control-W and then close the transcript with control-W.
troubles with opening are most likely due to the presence of a firewall
Simultaneous transcript and video (Lea’s question)
In order to browse a transcript and see the corresponding video you have to install QuickTime. The videos are at http://childes.psy.cmu.edu/media/
CHILDES II 08-11-2006 7
Sign language – BTS
The Berkeley Transcription System
designed by Dan Slobin, UC Berkeley and Nini Hoiting, Royal Institute for the Deaf “H. D. Guyot,” Haren, Netherlands
Under Talkbank you will also find a guide to create video files for analysis with the CLAN editor, at http://talkbank.org/dv/
AT: http://childes.psy.cmu.edu/manuals/bts.pdf
CHILDES II 08-11-2006 8
MLU, the +d option and Excel lists (Mechtild’s question)
+d This switch is used with MLU to create lists suitable for Excel. The command for the sample file is:mlu +d +tCHI sample.cha
In order to run this type of analysis, you must have an @ID header for each participant you wish to track.
You can use the +t switch in the form +tCHI to examine a whole collections of files. In this case, all of the *CHI lines will be examined in the corpus.
RUN: mlu +tCHI *.cha +f +d +u
Try this out for: 1) all files in the folder ne20 under the LIB directory2) merging all files together in one output file
CHILDES II 08-11-2006 9
MLU, the +d option and Excel lists (Mechtild’s question)
You will get this. The fields refer to
en|newengland|CHI|1;7.13|female|normal|.|Target_Child|.| 58 81 1.397 0.614en|newengland|CHI|1;9.7|male|normal|.|Target_Child|.| 57 64 1.123 0.328en|newengland|CHI|1;7.7|female|normal|.|Target_Child|.| 47 60 1.277 0.534 en|newengland|CHI|1;8.16|female|normal|.|Target_Child|.| 101 220 2.178 1.36 en|newengland|CHI|1;8.3|female|normal|.|Target_Child|.| 74 75 1.014 0.115
Open Excel and open the mlu file you just created
Select: delimited
Select: space
@ID field (language, corpus, file, age, participant id), number of utterances, number of morphemes, morphemes/utterances, standard deviation)
CHILDES II 08-11-2006 10
STATFREQ, the +d2 option and Excel lists
RUN: 1) freq +d2 +t*CHI *.cha and 2) statfreq stat.out.cex +f +d
If you want to count codes/words in several files and transport it to Excel to create tables, you use the STATFREQ program.
The STATFREQ program provides a way of producing a summary of word or code frequencies across a set of files.
Try this out again for: 1) all files in the folder ne20 under under the LIB directory
+d2 With this option, the output is sent to a file in a very specific form that is useful for input to STATFREQ.
This option creates a stat.out file to keep track of multiple .frq.cex output files.
You do not need to use the +f option with +d2, because this is assumed. You must include a +t specification in order to tell the +d2 option which
speaker to track for the STATFREQ analysis.
CHILDES II 08-11-2006 11
STATFREQ, the +d2 option and Excel lists
You will get this (and a much longer list): 1) the words used by the id (child) 2) in the specified files
3) the number of occurences
Open Excel and open the stat.out.sat.cex file you just created
Select: delimited
Select: space
age sex group SES role situation *CHI: a again ah all and animals apple
1;7.13 female normal . Target_Child . 247 1 1 0 0 1 1 0
1;9.7 male normal . Target_Child . 182 3 0 0 0 0 0 5
1;7.7 female normal . Target_Child . 169 6 0 0 0 0 0 0
1;8.16 female normal . Target_Child . 288 12 0 2 1 2 1 0
1;8.3 female normal . Target_Child . 268 1 0 2 0 0 0 0
CHILDES II 08-11-2006 12
FREQ and the wildcard *
Question: what would be the command for the frequency of all the animate nouns (N:A) ?
The Output:1 n:a|baby-all 1 n:a|ball-acc 1 n:a|duck-acc
If you want to know the frequency of various codes that are given in the %tiers, you may use the wildcard (asterix *).
An example from the sample2.cha in the LIB directory (these are Hungarian data). The %mor line in this file includes the codes:N:A|duck-ACC, N:I|plane-ACC, N:I|grape-ALL, and N:A|baby-ALL
(ACC = accusative; I = illative; N:A = animate noun; N:I = inanimate noun).
RUN: freq +t%mor +s"N:A|*" sample2.cha
CHILDES II 08-11-2006 13
Note: the file is not a cha, but a cex file (in legible CHAT format), because it was created from the input .file by running the MOR command
RUN: kwal +t%mor +s"n:prop|*" +s"det|*" eve15short.mor.cex
KWAL and the wildcard *
How would you use the wildcard for the command KWAL:1) in the file eve15short.mor.cex under LIB\morsamples2) for all speakers3) the morphological codes a. proper name and b. determiner
CHILDES II 08-11-2006 14
The CHSTR command
+c option You can make many changes in the data by using a text editor to create an
ASCII text file containing a list of words to be changed and what they should be changed to. This file should conform to this format: " oldstring " " newstring "
If you surround the oldstring and the newstring with spaces, CHSTRING treats them like words. This means that they are matched if they are preceded or followed spaces or other delimiters.
The default name for the file listing the changes is changes.cut. If you don’t specify a file name at the +c option, the program searches for changes.cut.
CHSTRING changes one string to another string in an ASCII text file. It is useful when you want to correct spelling, or make other changes. It allows you to make a number of changes in several files at the same time.
CHILDES II 08-11-2006 15
The CHSTR command
Make: a changes.cut with " eat " " quark "
Run: chstring +c sample.cha
See how the original file sample.cha has changed
CHILDES II 08-11-2006 16
Tracking final words
RUN: combo +s "wh*^(\!+?+.)" +t*CHI +u *.cha
In order to find the final words of utterances, you need to use the complete delimiter set in your COMBO search string. You can do this with this syntax (\!+?+.) where the parentheses enclose a
set of alternative delimiters. In order to specify the single word that appears before these delimiters, you
can use the asterisk wildcard.
Try this out for: 1) all files in the folder ne20 under the LIB directory2) the question words what, who, where, which, why in
sentence final position
CHILDES II 08-11-2006 17
Tracking Initial Words (p. 68 in the CLAN Manual)
Because there is no special character that marks the beginnings of files, it is difficult to compose search strings to track items at utterance initial position. To solve this problem, you can run use CHSTRING to insert sentence initial markers. A good marker to use is the ++ symbol, which is only rarely used for other purposes. You can use this command:
chstring +c -t@ -t% +t* *.cha
You also need to have a file called changes.cut that has this one line:
": " ": ++"In this one-line file, there are two quoted strings. The first has a colon followed by a tab. The second has a colon followed by a tab and then a double plus.
Make a changes.cut file under \LIB
CHILDES II 08-11-2006 18
Initial qu* (Magda’s question)
And see what you get
Now try out: chstring +c -t@ -t% +t* 55.cha
CHSTRING changes the ": " but not in initial position!How come?
My guess: The colon after a tier will not be read as being part of the utterance
I have at the moment no answer to this question
CHILDES II 08-11-2006 19
Final qu* (Magda’s question)
RUN: combo +s"qu*^(\!+?+.)" *.cha +u +t*MOT
For French the question words are que, qui, quoi, quand, quel
Now use a COMBO command to extract all French questionwords in final position 1) in all files in the Champaud folder under the LIB directory2) for the mother only3) merging all files together
CHILDES II 08-11-2006 20
A COMBO search (Magda)
RUN: combo +spens*^*qu* +u *.cha
How would you do this when looking in the files in the Champaud folder 1) penser + question word 2) for all speakers3) merging all files together
Recall that COMBO searches for a combination of words combined with Boolean operators like "and" or "or"
Some operators used with combo
^ immediately followed by * repeated character+ OR! NOT
Exampleswant directly followed by to.combo +s " want^to " sample.chawant eventually followed by to.combo +s"want^*^to" sample.chaboth want and to in any order:combo +s"want^to" +x sample.cha
CHILDES II 08-11-2006 21
Piping possibilities
For example: kwal +swait* -swaiter +d adam17.cha won’t work
make a frequency list: *freq +sgo* adam17.cha
However, you can use the piping functionkwal +swait* +d | kwal -swaiter +d
If you want to extract all and only utterances with (for example) a specific verb and its conjugations, you might want to use the options +s and –sBUTIn kwal the +s and –s option cannot be used together
Try this out for: 1) all conjugations of the verb ‘go’; 2) in file adam17; 3) for
speaker = child
pipe the +sgo* command with all words to be excluded: kwal +sgo* +d +t*CHI adam17.cha | kwal -s"good" -s"got" +d
CHILDES II 08-11-2006 22
More piping possibilitiesHow would you code the following?
Desired Output: a frequency count of the words - want and to when either one of them is coded as an imitation on the %spa line- neat when it is a continuation on the %spa line
combo +s(want^to^*^%spa:^*^$INI*)+(neat^*^%spa:^*^$CON*) +t%spa +f +d sample.cha freq +swant +sto +sneat sample.cmb.cex
+s option specifies that the words want, to, and $INI may occur in any order on the selected tiers. Same with neat and
The +t%spa option must be added in order to allow the program to look at the %spa tier when searching for a match.
The +d option is used to specify that the information produced by the program, such as file name, line number and exact position of words on the tier, should be excluded from the output. This way the output is in a legal CHAT format and can be used as an input to another CLAN program, FREQ in this case. The same effect could also be obtained by using the piping feature.
HOW?
CHILDES II 08-11-2006 23
More on tracking a subset: the +z option
+z10w analyze the first ten words only.+z10u analyze the first ten utterances only.+z10w-20w analyze 11 words starting with the 10th word.ETC..
kwal +d +z1-3u +t*CHI sample.cha | kwal +sMommy
+z This option allows the user to select any range of words, utterances, or speaker turns to be analyzed. The range specifications should immediately follow the option.
Try this out for: 1) 0012.cha under LIB\ 2) speaker = child 3) the first 20 utterances
If the +z option is used together with the +s option, you should use piping
Try this out for: 1) sample.cha 2) speaker = child 3) the utterances 1-3 3) the word
Mommy
mlu +z20u +t*CHI sample.cha
CHILDES II 08-11-2006 24
depfile.cut in the /lib directory (CHILDES\CLAN\LIB\depfile)
The depfile distributed with CLAN lists the legitimate CHAT headers and dependent tier names as well as many of the strings allowed within the main line and the various dependent tiers.
When running CHECK, you should have the file called depfile.cut located in your LIB (=library) directory, which you set from the Commands window. If the programs cannot find the depfile, they will query you for its location.
Choose: LIB\ as your working directoryRun CHECK on the file kid10.cha
In the LIB directory kid10.cha has a large number of CHAT errors.
If CHECK finds errors, you have to fix them. See the CLAN manual, p. 51
CHILDES II 08-11-2006 25
CHIP and repetitions in conversation (Magda)
CHIP analyzes specified pairs of utterances. CHIP can be used to explore parentals input, the relation between speech
acts and imitation. When you use CHIP for data in a publication, you should should cite
Sokolov and MacWhinney (1990). There are four major aspects of CHIP (1) the tier creation system, (2) the
coding system, (3) the technique for defining substitution classes, and (4) the nature of the summary statistics.
See the CLAN Manual at pages 53-59
CHIP compares two specified utterances and produces an analysis that it then inserts onto a new coding tier.
The first utterance in the designated utterance pair is the “source” utterance and the second is the “response” utterance. The response is compared to the source.
Speakers are designated by the +b and +c
CHILDES II 08-11-2006 26
CHIP and repetitions in conversation
@Begin@Participants: MOT Mother, CHI Child*MOT: what-'is that?*CHI: hat.*MOT: a hat!*CHI: a hat.*MOT: and what-'is this?*CHI: that a hat!*MOT: yes that-'is the hat.@End
RUN: chip +bMOT +cCHI chip.cha RUN: chip +bMOT +cCHI –ns –nb chip.cha
There are four aspects of CHIP
1) the tier creation system 2) the coding system3) the technique for defining
substitution classes4) the nature of the summary
statistics.
Try CHIP out for: 1) chip.cha file 2) +bMOT and +cCHI
There is a chip.cha file under CLAN to try this command out
With the options –ns and –nb CHIP produces an analysis of the child’s responses to an adult utterance only. Try this out
CHILDES II 08-11-2006 27
An excercise
Try out the piping option of
More on my web http://www.let.uu.nl/~Jacqueline.vanKampen/personal/
combo +s(want^to^*^%spa:^*^$INI*)+(neat^*^%spa:^*^$CON*) +t%spa +f +d sample.cha freq +swant +sto +sneat sample.cmb.cex