Meaning In Language Research Group (MILa)

photo by S. Malamud

Corpus of Bilingual Russian      
Child Speech      

Sample corpus files

BiRCh corpus is a project in progress. These files are updated whenever changes are made, and new files posted as they are ready. Sample files below all pertain to child B, USA, b. Dec. 30, 2012

Pseudonymized audio files

Pseudonymized transcriptions annotated for disfluencies (ELAN format, XML)

Morphologically tagged pseudonymized transcriptions (XML)
         Quirks of our pilot annotation software mean that these tagged files group utterances by speaker - first all child utterances, then mother's, then other speakers'.
        In future, tagged files will be chronological and will include speaker tags (child, mother, father, etc.).


Corpus Materials


Instructions for parents

Guidelines for transcription
& disfluency annotation

Relevant publications

[full citation on MILa: publications page]

Lưu, Malamud & Xue 2016

Dubinina & Malamud, forthcoming

Copyright 2013