Construction of Russian Translation Data for the "Balanced Corpus of Contemporary Written Japanese" and the Possibilities of Using Them in Japanese-Russian Comparative Studies
アイテムタイプ
紀要論文 / Departmental Bulletin Paper
言語
日本語
キーワード
『現代日本語書き言葉均衡コーパス』, 対訳コーパス, ロシア語, 文末表現
キーワード(英)
"Balanced Corpus of Contemporary Written Japanese", parallel corpus, Russian, expressions at the end of sentences
A part of the data of the "Balanced Corpus of Contemporary Written Japanese" (BCCWJ) is translated into English, Italian, Chinese, and Indonesian. We added new translation data collected from 16 samples of newspaper (PN) core data to BCCWJ in Russian. The total length of the Japanese source text is 16,657 short unit words, which corresponds to 13,070 words in the Russian target text. The translation was conducted manually by a native Russian speaker. During the translation, various difficulties were encountered due to significant structural and lexical differences between Japanese and Russian. This study introduces the data construction method that we used and some key points that we focused on while translating. We also manually aligned all sentences in the source text with those in the translation and assigned an ID to each sentence; this study provides an explanation regarding this workflow as well. Translation and alignment make the original data and their translation function as a simple Japanese-Russian parallel corpus. This can be useful for Japanese-Russian comparative studies and linguistic typology studies. In this study, we address Japanese sentence endings and compare them with Russian ones as a case study to present the possible ways of using our new translation data.