『Webデータに基づく複合動詞用例データベース』の構築と評価

山口, 昌也; ヤマグチ, マサヤ; YAMAGUCHI, Masaya

WEKO3

lat lon distance

[[sub_check.contents]]

[[sub_radio.contents]]

Field does not validate

[[sub_attr.contents]]　

インデックスツリー

アイテム

『Webデータに基づく複合動詞用例データベース』の構築と評価

https://doi.org/10.15084/00002222

名前 / ファイル	ライセンス	アクション
papers1702.pdf (1.2 MB)

Item type

紀要論文 / Departmental Bulletin Paper(1)

公開日

2019-07-25

タイトル

『Webデータに基づく複合動詞用例データベース』の構築と評価

タイトル

Construction and Evaluation of "Database of Japanese Compound Verb Examples Based on Web Pages"

言語

jpn

キーワード

主題Scheme

Other

主題

日本語複合動詞

キーワード

主題Scheme

Other

主題

用例データベース

キーワード

主題Scheme

Other

主題

Webコーパス

キーワード

言語

主題Scheme

Other

主題

Japanese compound verb

キーワード

言語

主題Scheme

Other

主題

example database

キーワード

言語

主題Scheme

Other

主題

web corpus

資源タイプ

資源タイプ識別子

http://purl.org/coar/resource_type/c_6501

資源タイプ

departmental bulletin paper

ID登録

10.15084/00002222

ID登録タイプ

JaLC

著者

山口, 昌也

WEKO 7565

	山口, 昌也
ja-Kana	ヤマグチ, マサヤ

Search repository

YAMAGUCHI, Masaya

著者所属

内容記述タイプ

Other

内容記述

国立国語研究所研究系音声言語研究領域

著者所属(英)

内容記述タイプ

Other

内容記述

Spoken Language Division, Research Department, NINJAL

抄録

内容記述タイプ

Abstract

内容記述

本論文では，『Webデータに基づく複合動詞用例データベース』の構築方法を示し，構築結果を評価する。用例データベースの構築目的は，複合動詞とその構成動詞間の関係を分析することとし，複合動詞の用例，語構成，格解析結果に加え，構成動詞の用例，格解析結果を収録する。本構築手法では，必要な量の用例を確保するために個々の動詞専用のWebコーパスを作成し，Webコーパスの構築量を削減するために漸進的に用例データベースを構築する。この際，用例データベースに登録する複合動詞は，Webから収集可能な量を基準として，半自動的に収集される。本手法により収集した複合動詞は3371語，用例数（中央値）は1173例であった。この結果は，岩波国語辞典収録語の約77.2%をカバーする。本手法の評価として，約2.1億語の汎用Webコーパスから収集した用例集合と比較し，幅広い生起確率を持つ複合動詞1829語に対して，用例を1000例以上収集できることを確認した。また，汎用コーパスから抽出した用例集合の格要素の分布とのコサイン類似度は，複合動詞0.878，単一動詞0.919となった。この結果は，本手法により収集した用例の分布が，汎用コーパスと類似しており，用例収集の偏りが抑制されていることを示唆する。

抄録(英)

内容記述タイプ

Other

内容記述

This paper presents a method of constructing a database of Japanese compound verb examples, and evaluates the database. The objective of constructing this database is to analyze the relationship between Japanese compound and component verbs. Whether to include a compound verb in the database is determined semi-automatically by the number of examples that can be extracted from these Web corpora. The actual database that resulted from this method consists of 3371 compound verbs (median number of examples per verb = 1173). It covers 77.2% of the relevant entry words in the Iwanami Japanese language dictionary. A comparison with a general-purpose Web corpus shows that this method enabled to collect more than 1000 examples for 1829 compound verbs with a wide range of probability of occurrence. The average cosine similarity between the distributions of case-marked elements in the database examples and in those extracted from the Web corpus is 0.878 for compound verbs. Therefore, this result suggests that the bias of examples is controlled.

出版者

国立国語研究所

書誌情報

国立国語研究所論集
en : NINJAL Research Papers

号 17, p. 15-34, 発行日 2019-07

ISSN

収録物識別子タイプ

ISSN

収録物識別子

2186-134X

ISSN

収録物識別子タイプ

ISSN

収録物識別子

2186-1358

書誌レコードID

収録物識別子タイプ

NCID

収録物識別子

AA12536262

フォーマット

内容記述タイプ

Other

内容記述

application/pdf

著者版フラグ

出版タイプ

VoR

出版タイプResource

http://purl.org/coar/version/c_970fb48d4fbd8a85

戻る

views

See details

	Views

Versions

Ver.1

2023-05-15 15:07:14.792386

Show All versions

Cite as

エクスポート

OAI-PMH

JPCOAR 2.0
JPCOAR 1.0
DublinCore
DDI

Other Formats

JSON
BIBTEX

インデックスリンク

インデックスツリー

アイテム

『Webデータに基づく複合動詞用例データベース』の構築と評価

× 山口, 昌也

× YAMAGUCHI, Masaya

Versions

Share

Cite as

エクスポート