Research Department, NINJAL
Adjunct Researcher, Research Department, NINJAL
Adjunct Researcher, Research Department, NINJAL
Adjunct Researcher, Research Department, NINJAL
Research Department, NINJAL
Center for Language Resource Development, NINJAL
Adjunct Researcher, Research Department, NINJAL
Chiba University
Adjunct Researcher, Research Department, NINJAL
Technical Assistant, Research Department, NINJAL
We have constructed the Corpus of Everyday Japanese Conversation (CEJC) and published it in March 2022. The main features of the CEJC include: i) a focus on conversations that occurred naturally in activities of daily life; ii) a balanced collection of everyday conversations that capture their diversity and facilitate the observation of natural, conversational behavior in daily life; and iii) the publication of audio and video data for a better understanding of the mechanism of real-life social behavior. The publication of a large-scale corpus of everyday conversations that includes video data is a new approach. The CEJC contains 200 hours of speech, 577 conversations, approximately 2.4 million words, and 1,675 speakers. In this paper, we describe the process involved in the design and construction of CEJC including the recording method and devices used, structure of the corpus, formats of the audio and video files, transcription, and annotations. We then examine how the conversations in the corpus were selected and compiled in a balanced manner to showcase their variety.