Meaning In Language Research Group (MILa)

photo by S. Malamud

Corpus of Bilingual Russian      
Child Speech      
[BiRCh]     
                    

Sample corpus files

BiRCh corpus is a project in progress. These files are updated whenever changes are made, and new files posted as they are ready. Sample files below all pertain to child B, USA, b. Dec. 30, 2012

Pseudonymized audio files
B__2014.10.24_20.26_01.mp3
B__2014.11.03_15.06_01.mp3

Pseudonymized transcriptions annotated for disfluencies (ELAN format, XML)
B_2014.10.24_20.26_01_an.eaf
B_2014.11.03_15.06_01_an.eaf

Morphologically tagged pseudonymized transcriptions (XML)
B_2014.10.24_20.26_01_an.xml
B_2014.11.03_15.06_01_an.xml
         Quirks of our pilot annotation software mean that these tagged files group utterances by speaker - first all child utterances, then mother's, then other speakers'.
        In future, tagged files will be chronological and will include speaker tags (child, mother, father, etc.).

 

Corpus Materials

 

Instructions for parents
[English]
[Russian]

Guidelines for transcription
& disfluency annotation
[English]
[Russian]

Relevant publications

[full citation on MILa: publications page]

Lưu, Malamud & Xue 2016

Dubinina & Malamud, forthcoming

Copyright 2013