Language Change Division, Research Department, NINJAL
Graduate Student, Humanities and Sociology, The University of Tokyo
Tokoha University
Chiba University
Language Change Division, Research Department, NINJAL
Language Change Division, Research Department, NINJAL
The Ninjobon Corpus is currently under construction as a part of the Edo Period Collection of the Corpus of Historical Japanese. In October 2015, a trial version of the Ninjobon Corpus (full text search system in the Himawari edition) focusing on the Hiyokurenri Hana no Shimadai was publicly released. The Ninjobon Corpus creation is at the stage of (1) faithful transcription of the original printed book into text, and (2) creation of the "Himawari" XML texts with minimal revisions to (1). In the creation of the XML texts, the tag set is fundamentally based on the Sharebon Corpus, though a tag set with tags related to ligatures and revisions was prepared for the Ninjobon. Further, the results of a morphological analysis of the first volume of Hana no Shimadai showed an analytical precision of approximately 87%. The low precision is caused by the large number of characteristically irregular readings in the Ninjobon. One challenge in a corpus construction with annotated morphological information is on how to address the "rubies" attached to kanji characters with irregular native Japanese readings.