分散表現を利用した日本語歴史コーパスにおける語義曖昧性解消の通時適応

古宮, 嘉那子; 田邊, 絢; 新納, 浩幸; KOMIYA, Kanako; TANABE, Aya; SHINNOU, Hiroyuki

WEKO3

lat lon distance

[[sub_check.contents]]

[[sub_radio.contents]]

Field does not validate

[[sub_attr.contents]]　

インデックスツリー

アイテム

分散表現を利用した日本語歴史コーパスにおける語義曖昧性解消の通時適応

https://doi.org/10.15084/00003566

名前 / ファイル	ライセンス	アクション
papers2303.pdf (609.6 kB)

アイテムタイプ

紀要論文 / Departmental Bulletin Paper(1)

公開日

2022-07-27

タイトル

分散表現を利用した日本語歴史コーパスにおける語義曖昧性解消の通時適応

タイトル

Diachronic Domain Adaptation of Word Sense Disambiguation in Corpus of Historical Japanese Using Word Embeddings

言語

eng

キーワード

主題Scheme

Other

主題

領域適応

キーワード

主題Scheme

Other

主題

歴史コーパス

キーワード

主題Scheme

Other

主題

通時適応

キーワード

主題Scheme

Other

主題

語義曖昧性解消

キーワード

主題Scheme

Other

主題

分散表現

キーワード

言語

主題Scheme

Other

主題

domain adaptation

キーワード

言語

主題Scheme

Other

主題

historical corpus

キーワード

言語

主題Scheme

Other

主題

diachronic adaptation

キーワード

言語

主題Scheme

Other

主題

word sense disambiguation

キーワード

言語

主題Scheme

Other

主題

word embeddings

資源タイプ

資源タイプ識別子

http://purl.org/coar/resource_type/c_6501

資源タイプ

departmental bulletin paper

ID登録

10.15084/00003566

ID登録タイプ

JaLC

著者

古宮, 嘉那子
田邊, 絢
新納, 浩幸
KOMIYA, Kanako
TANABE, Aya
SHINNOU, Hiroyuki

著者所属

内容記述タイプ

Other

内容記述

東京農工大学

著者所属

内容記述タイプ

Other

内容記述

茨城大学

著者所属

内容記述タイプ

Other

内容記述

茨城大学

著者所属(英)

内容記述タイプ

Other

内容記述

Tokyo University of Agriculture and Technology

著者所属(英)

内容記述タイプ

Other

内容記述

Ibaraki University

著者所属(英)

内容記述タイプ

Other

内容記述

Ibaraki University

抄録

内容記述タイプ

Abstract

内容記述

語義タグ付きコーパスを用いた現代日本語の語義曖昧性解消の研究は数多い。しかし，入手可能なタグ付きコーパスが少ないため，日本語の古典語の語義曖昧性解消を高性能に行うことは難しい。そのため，現代日本語文を用いて通時的な領域適応を行うことは，古典語の語義曖昧性解消の性能を高めるひとつの解決方法であると考えられる。本研究では，日本語の古典語の語義曖昧性解消において，領域適応手法のひとつである，分散表現のfine-tuningの効果について調べる。現代文の分散表現であるNWJC2vecの古典語によるfine-tuningや，古典語によって作成した分散表現の現代文によるfine-tuningなど，様々なfine-tuningのシナリオを検証した。さらに，NWJC2vecを古典語でfine-tuningする際には，時代順に段階的に分散表現をfine-tuningする手法についても試した。語義曖昧性解消の対象語の前後二語ずつの単語の分散表現を素性とし，Support Vector Machineの分類器に用いて分類を行った。シナリオは（1）現代文のコーパスの全用例と古典語のコーパスの用例8割を訓練事例とし，残りの2割の古典語の用例をテストとして利用する場合，（2）古典語の用例だけを利用して五分割交差検定を行った場合，（3）現代文のコーパスの全用例を訓練事例とし，古典語全用例をテストする場合の三通りを比較した。最高の精度となったのは，（2）古典語の用例だけを利用したシナリオで，古典語によって作成した分散表現に現代文によるfine-tuningを行った場合であった。

抄録(英)

内容記述タイプ

Other

内容記述

There have been many studies on word sense disambiguation (WSD) in contemporary Japanese. However, it is difficult to achieve high performance of WSD in historical Japanese because of the lack of sense-tagged corpora. Therefore, diachronic adaptation using contemporary Japanese could be a solution. We investigated the effectiveness of the fine-tuning of word embeddings for WSD in historical Japanese. A variety of fine-tuning scenarios are examined, including the case where the word embeddings of contemporary Japanese (NWJC2vec) are fine-tuned with historical Japanese and the case where the word embeddings trained with historical Japanese are fine-tuned with contemporary Japanese. Moreover, when NWJC2vec was fine-tuned with a historical corpus, the case where the word embeddings were gradually fine-tuned in the order of time was also tested. The word embeddings of two words before and after the target word are used as the features for the support vector machine, which is a classifier of WSD. The following three scenarios are compared: (1) all the examples from the contemporary Japanese corpus and 80% examples from the historical corpus are used as the training data for the test of the remaining 20% examples from the historical corpus, (2) 5-fold cross validation of the examples of the historical Japanese corpus, and (3) all the examples from the contemporary corpus are used as the training data for test examples from the historical corpus. The best accuracy was achieved when we used word embeddings trained from a historical corpus and fine-tuned with a contemporary corpus in the 5-fold cross validation scenario.

出版者

国立国語研究所

書誌情報

国立国語研究所論集
en : NINJAL Research Papers

号 23, p. 59-73, 発行日 2022-07

ISSN

収録物識別子タイプ

ISSN

収録物識別子

2186-1358

フォーマット

内容記述タイプ

Other

内容記述

application/pdf

著者版フラグ

出版タイプ

VoR

出版タイプResource

http://purl.org/coar/version/c_970fb48d4fbd8a85

戻る

views

See details

	Views

Versions

Ver.1

2023-05-15 14:43:44.758014

Show All versions

Cite as

Other

エクスポート

OAI-PMH

JPCOAR 2.0
JPCOAR 1.0
DublinCore
DDI

Other Formats

インデックスリンク

インデックスツリー

アイテム

分散表現を利用した日本語歴史コーパスにおける語義曖昧性解消の通時適応

× 古宮, 嘉那子

× 田邊, 絢

× 新納, 浩幸

× KOMIYA, Kanako

× TANABE, Aya

× SHINNOU, Hiroyuki

Versions

Share

Cite as

Other

エクスポート

コミュニティ

メニューを最小化

インデックスリンク

インデックスツリー

アイテム

分散表現を利用した日本語歴史コーパスにおける語義曖昧性解消の通時適応

× 古宮, 嘉那子

× 田邊, 絢

× 新納, 浩幸

× KOMIYA, Kanako

× TANABE, Aya

× SHINNOU, Hiroyuki

Versions

Share

Cite as

Other

エクスポート

コミュニティ