コーパス日本語学のための言語資源 : 形態素解析用電子化辞書の開発とその応用

伝, 康晴; デン, ヤスハル; 小木曽, 智信; オギソ, トシノブ; 小椋, 秀樹; オグラ, ヒデキ; 山田, 篤; ヤマダ, アツシ; 峯松, 信明; ミネマツ, ノブアキ; 内元, 清貴; ウチモト, キヨタカ; 小磯, 花絵; コイソ, ハナエ; DEN, Yasuharu; OGISO, Toshinobu; OGURA, Hideki; YAMADA, Atsushi; MINEMATSU, Nobuaki; UCHIMOTO, Kiyotaka; KOISO, Hanae

WEKO3

lat lon distance

[[sub_check.contents]]

[[sub_radio.contents]]

Field does not validate

[[sub_attr.contents]]　

インデックスツリー

アイテム

{"_buckets": {"deposit": "99999922-dad7-4d1d-ac5c-750f368cc42e"}, "_deposit": {"created_by": 3, "id": "2201", "owners": [3], "pid": {"revision_id": 0, "type": "depid", "value": "2201"}, "status": "published"}, "_oai": {"id": "oai:repository.ninjal.ac.jp:00002201", "sets": ["312"]}, "author_link": ["7406", "7414", "7416", "7405", "7412", "7407", "7417", "7404", "7409", "7411", "7415", "7410", "7408", "7413"], "item_10002_biblio_info_40": {"attribute_name": "書誌情報", "attribute_value_mlt": [{"bibliographicIssueDates": {"bibliographicIssueDate": "2007-10-25", "bibliographicIssueDateType": "Issued"}, "bibliographicPageEnd": "123", "bibliographicPageStart": "101", "bibliographicVolumeNumber": "22", "bibliographic_titles": [{"bibliographic_title": "日本語科学"}, {"bibliographic_title": "Japanese Linguistics", "bibliographic_titleLang": "en"}]}]}, "item_10002_description_34": {"attribute_name": "著者所属", "attribute_value_mlt": [{"subitem_description": "千葉大学", "subitem_description_type": "Other"}, {"subitem_description": "国立国語研究所", "subitem_description_type": "Other"}, {"subitem_description": "国立国語研究所", "subitem_description_type": "Other"}, {"subitem_description": "京都高度技術研究所", "subitem_description_type": "Other"}, {"subitem_description": "東京大学", "subitem_description_type": "Other"}, {"subitem_description": "情報通信研究機構", "subitem_description_type": "Other"}, {"subitem_description": "国立国語研究所", "subitem_description_type": "Other"}]}, "item_10002_description_35": {"attribute_name": "著者所属(英)", "attribute_value_mlt": [{"subitem_description": "Chiba University", "subitem_description_type": "Other"}, {"subitem_description": "The National Institute for Japanese Language", "subitem_description_type": "Other"}, {"subitem_description": "The National Institute for Japanese Language", "subitem_description_type": "Other"}, {"subitem_description": "ASTEM", "subitem_description_type": "Other"}, {"subitem_description": "The University of Tokyo", "subitem_description_type": "Other"}, {"subitem_description": "National Institute of Information and Communications Technology", "subitem_description_type": "Other"}, {"subitem_description": "The National Institute for Japanese Language", "subitem_description_type": "Other"}]}, "item_10002_description_36": {"attribute_name": "抄録", "attribute_value_mlt": [{"subitem_description": "コーパス日本語学への応用を指向した形態素解析用電子化辞書UniDicを開発した。大規模コーパスに対する形態論情報付与作業には,計算機を用いた形態素解析システムの利用が不可欠であるが,既存の形態素解析システム用辞書には,コーパス日本語学への応用を考える上でさまざまな不都合がある。1つは,単位の認定がある場合には長く,ある場合には短いといった不揃いがあることであり,もう1つは,異表記や異形態に対して同一の見出しが与えられないということである。言語研究で重要な要件となる,このような単位の斉一性や見出しの同一性への対処といったことを中心に,本電子化辞書の設計方針とそれを実装した辞書データベースシステムについて述べる。さらに,この設計の有用性を示すため,表記や語形の変異に関するコーパス分析の事例を紹介する。", "subitem_description_type": "Abstract"}]}, "item_10002_description_37": {"attribute_name": "抄録(英)", "attribute_value_mlt": [{"subitem_description": "In this paper, we describe the design and the implementation of an electronic dictionary for morphological analysis, UniDic, which aims particularly at application to Japanese corpus linguistics. It has been indispensable for the development of a large-scale corpus to utilize an automatic morphological analyzer on computer. The existing dictionaries for morphological analyzers, however, reveal lots of problems when used in corpus linguistics, such as unevenness in defining a unit and failure in handling allomorphs and orthographic variants. Our dictionary, in contrast, deals with the uniformity of units and the identity of indexes, which are important requirements for linguistic analysis of corpora. We adopt multi-level definition of word units, consisting of short-, middle-, and long-unit words, and structured representation of indexes, composed of lemma, word form, orthography, and pronunciation. We develop a database system that straight-forwardly implements this design of the dictionary and a friendly user-interface for dictionary builders to be capable of searching and registering entries with grasping the complex structure of the indexes. We also show how this structured representation benefits us in analyzing morphologically annotated corpora, presenting case studies that investigate the variation of word form in spoken language corpus and the variation of orthography in written language corpus.", "subitem_description_type": "Other"}]}, "item_10002_description_51": {"attribute_name": "フォーマット", "attribute_value_mlt": [{"subitem_description": "application/pdf", "subitem_description_type": "Other"}]}, "item_10002_identifier_registration": {"attribute_name": "ID登録", "attribute_value_mlt": [{"subitem_identifier_reg_text": "10.15084/00002185", "subitem_identifier_reg_type": "JaLC"}]}, "item_10002_publisher_39": {"attribute_name": "出版者", "attribute_value_mlt": [{"subitem_publisher": "国書刊行会"}]}, "item_10002_version_type_52": {"attribute_name": "著者版フラグ", "attribute_value_mlt": [{"subitem_version_resource": "http://purl.org/coar/version/c_970fb48d4fbd8a85", "subitem_version_type": "VoR"}]}, "item_creator": {"attribute_name": "著者", "attribute_type": "creator", "attribute_value_mlt": [{"creatorNames": [{"creatorName": "伝, 康晴"}, {"creatorName": "デン, ヤスハル", "creatorNameLang": "ja-Kana"}], "nameIdentifiers": [{"nameIdentifier": "7404", "nameIdentifierScheme": "WEKO"}]}, {"creatorNames": [{"creatorName": "小木曽, 智信"}, {"creatorName": "オギソ, トシノブ", "creatorNameLang": "ja-Kana"}], "nameIdentifiers": [{"nameIdentifier": "7405", "nameIdentifierScheme": "WEKO"}]}, {"creatorNames": [{"creatorName": "小椋, 秀樹"}, {"creatorName": "オグラ, ヒデキ", "creatorNameLang": "ja-Kana"}], "nameIdentifiers": [{"nameIdentifier": "7406", "nameIdentifierScheme": "WEKO"}]}, {"creatorNames": [{"creatorName": "山田, 篤"}, {"creatorName": "ヤマダ, アツシ", "creatorNameLang": "ja-Kana"}], "nameIdentifiers": [{"nameIdentifier": "7407", "nameIdentifierScheme": "WEKO"}]}, {"creatorNames": [{"creatorName": "峯松, 信明"}, {"creatorName": "ミネマツ, ノブアキ", "creatorNameLang": "ja-Kana"}], "nameIdentifiers": [{"nameIdentifier": "7408", "nameIdentifierScheme": "WEKO"}]}, {"creatorNames": [{"creatorName": "内元, 清貴"}, {"creatorName": "ウチモト, キヨタカ", "creatorNameLang": "ja-Kana"}], "nameIdentifiers": [{"nameIdentifier": "7409", "nameIdentifierScheme": "WEKO"}]}, {"creatorNames": [{"creatorName": "小磯, 花絵"}, {"creatorName": "コイソ, ハナエ", "creatorNameLang": "ja-Kana"}], "nameIdentifiers": [{"nameIdentifier": "7410", "nameIdentifierScheme": "WEKO"}]}, {"creatorNames": [{"creatorName": "DEN, Yasuharu", "creatorNameLang": "en"}], "nameIdentifiers": [{"nameIdentifier": "7411", "nameIdentifierScheme": "WEKO"}]}, {"creatorNames": [{"creatorName": "OGISO, Toshinobu", "creatorNameLang": "en"}], "nameIdentifiers": [{"nameIdentifier": "7412", "nameIdentifierScheme": "WEKO"}]}, {"creatorNames": [{"creatorName": "OGURA, Hideki", "creatorNameLang": "en"}], "nameIdentifiers": [{"nameIdentifier": "7413", "nameIdentifierScheme": "WEKO"}]}, {"creatorNames": [{"creatorName": "YAMADA, Atsushi", "creatorNameLang": "en"}], "nameIdentifiers": [{"nameIdentifier": "7414", "nameIdentifierScheme": "WEKO"}]}, {"creatorNames": [{"creatorName": "MINEMATSU, Nobuaki", "creatorNameLang": "en"}], "nameIdentifiers": [{"nameIdentifier": "7415", "nameIdentifierScheme": "WEKO"}]}, {"creatorNames": [{"creatorName": "UCHIMOTO, Kiyotaka", "creatorNameLang": "en"}], "nameIdentifiers": [{"nameIdentifier": "7416", "nameIdentifierScheme": "WEKO"}]}, {"creatorNames": [{"creatorName": "KOISO, Hanae", "creatorNameLang": "en"}], "nameIdentifiers": [{"nameIdentifier": "7417", "nameIdentifierScheme": "WEKO"}]}]}, "item_files": {"attribute_name": "ファイル情報", "attribute_type": "file", "attribute_value_mlt": [{"accessrole": "open_date", "date": [{"dateType": "Available", "dateValue": "2019-03-25"}], "displaytype": "detail", "download_preview_message": "", "file_order": 0, "filename": "kk_ngkgk_022_07.pdf", "filesize": [{"value": "1.8 MB"}], "format": "application/pdf", "future_date_message": "", "is_thumbnail": false, "licensetype": "license_free", "mimetype": "application/pdf", "size": 1800000.0, "url": {"label": "kk_ngkgk_022_07.pdf", "url": "https://repository.ninjal.ac.jp/record/2201/files/kk_ngkgk_022_07.pdf"}, "version_id": "aa22f10e-cd41-4fb1-9ce2-fb203f26f9a6"}]}, "item_keyword": {"attribute_name": "キーワード", "attribute_value_mlt": [{"subitem_subject": "電子化辞書", "subitem_subject_scheme": "Other"}, {"subitem_subject": "形態素解析", "subitem_subject_scheme": "Other"}, {"subitem_subject": "データベース", "subitem_subject_scheme": "Other"}, {"subitem_subject": "単位の斉一性", "subitem_subject_scheme": "Other"}, {"subitem_subject": "見出しの同一性", "subitem_subject_scheme": "Other"}, {"subitem_subject": "electronic dictionary", "subitem_subject_language": "en", "subitem_subject_scheme": "Other"}, {"subitem_subject": "morphological analysis", "subitem_subject_language": "en", "subitem_subject_scheme": "Other"}, {"subitem_subject": "database system", "subitem_subject_language": "en", "subitem_subject_scheme": "Other"}, {"subitem_subject": "uniformity of units", "subitem_subject_language": "en", "subitem_subject_scheme": "Other"}, {"subitem_subject": "identity of indexes", "subitem_subject_language": "en", "subitem_subject_scheme": "Other"}]}, "item_language": {"attribute_name": "言語", "attribute_value_mlt": [{"subitem_language": "jpn"}]}, "item_resource_type": {"attribute_name": "資源タイプ", "attribute_value_mlt": [{"resourcetype": "departmental bulletin paper", "resourceuri": "http://purl.org/coar/resource_type/c_6501"}]}, "item_title": "コーパス日本語学のための言語資源 : 形態素解析用電子化辞書の開発とその応用", "item_titles": {"attribute_name": "タイトル", "attribute_value_mlt": [{"subitem_title": "コーパス日本語学のための言語資源 : 形態素解析用電子化辞書の開発とその応用"}, {"subitem_title": "The development of an electronic dictionary for morphological analysis and its application to Japanese corpus linguistics", "subitem_title_language": "en"}]}, "item_type_id": "10002", "owner": "3", "path": ["312"], "permalink_uri": "https://doi.org/10.15084/00002185", "pubdate": {"attribute_name": "公開日", "attribute_value": "2019-03-25"}, "publish_date": "2019-03-25", "publish_status": "0", "recid": "2201", "relation": {}, "relation_version_is_last": true, "title": ["コーパス日本語学のための言語資源 : 形態素解析用電子化辞書の開発とその応用"], "weko_shared_id": -1}

コーパス日本語学のための言語資源 : 形態素解析用電子化辞書の開発とその応用

https://doi.org/10.15084/00002185

名前 / ファイル	ライセンス	アクション
kk_ngkgk_022_07.pdf (1.8 MB)

Item type

紀要論文 / Departmental Bulletin Paper(1)

公開日

2019-03-25

タイトル

コーパス日本語学のための言語資源 : 形態素解析用電子化辞書の開発とその応用

タイトル

言語

タイトル

The development of an electronic dictionary for morphological analysis and its application to Japanese corpus linguistics

言語

jpn

キーワード

主題Scheme

Other

主題

電子化辞書

キーワード

主題Scheme

Other

主題

形態素解析

キーワード

主題Scheme

Other

主題

データベース

キーワード

主題Scheme

Other

主題

単位の斉一性

キーワード

主題Scheme

Other

主題

見出しの同一性

キーワード

言語

主題Scheme

Other

主題

electronic dictionary

キーワード

言語

主題Scheme

Other

主題

morphological analysis

キーワード

言語

主題Scheme

Other

主題

database system

キーワード

言語

主題Scheme

Other

主題

uniformity of units

キーワード

言語

主題Scheme

Other

主題

identity of indexes

資源タイプ

資源タイプ識別子

http://purl.org/coar/resource_type/c_6501

資源タイプ

departmental bulletin paper

ID登録

10.15084/00002185

ID登録タイプ

JaLC

著者

伝, 康晴

WEKO 7404

	伝, 康晴
ja-Kana	デン, ヤスハル

Search repository

小木曽, 智信

WEKO 7405

	小木曽, 智信
ja-Kana	オギソ, トシノブ

Search repository

小椋, 秀樹

WEKO 7406

	小椋, 秀樹
ja-Kana	オグラ, ヒデキ

Search repository

山田, 篤

WEKO 7407

	山田, 篤
ja-Kana	ヤマダ, アツシ

Search repository

峯松, 信明

WEKO 7408

	峯松, 信明
ja-Kana	ミネマツ, ノブアキ

Search repository

内元, 清貴

WEKO 7409

	内元, 清貴
ja-Kana	ウチモト, キヨタカ

Search repository

小磯, 花絵

WEKO 7410

	小磯, 花絵
ja-Kana	コイソ, ハナエ

Search repository

DEN, Yasuharu
OGISO, Toshinobu
OGURA, Hideki
YAMADA, Atsushi
MINEMATSU, Nobuaki
UCHIMOTO, Kiyotaka
KOISO, Hanae

著者所属

内容記述タイプ

Other

内容記述

千葉大学

著者所属

内容記述タイプ

Other

内容記述

国立国語研究所

著者所属

内容記述タイプ

Other

内容記述

国立国語研究所

著者所属

内容記述タイプ

Other

内容記述

京都高度技術研究所

著者所属

内容記述タイプ

Other

内容記述

東京大学

著者所属

内容記述タイプ

Other

内容記述

情報通信研究機構

著者所属

内容記述タイプ

Other

内容記述

国立国語研究所

著者所属(英)

内容記述タイプ

Other

内容記述

Chiba University

著者所属(英)

内容記述タイプ

Other

内容記述

The National Institute for Japanese Language

著者所属(英)

内容記述タイプ

Other

内容記述

The National Institute for Japanese Language

著者所属(英)

内容記述タイプ

Other

内容記述

ASTEM

著者所属(英)

内容記述タイプ

Other

内容記述

The University of Tokyo

著者所属(英)

内容記述タイプ

Other

内容記述

National Institute of Information and Communications Technology

著者所属(英)

内容記述タイプ

Other

内容記述

The National Institute for Japanese Language

抄録

内容記述タイプ

Abstract

内容記述

コーパス日本語学への応用を指向した形態素解析用電子化辞書UniDicを開発した。大規模コーパスに対する形態論情報付与作業には,計算機を用いた形態素解析システムの利用が不可欠であるが,既存の形態素解析システム用辞書には,コーパス日本語学への応用を考える上でさまざまな不都合がある。1つは,単位の認定がある場合には長く,ある場合には短いといった不揃いがあることであり,もう1つは,異表記や異形態に対して同一の見出しが与えられないということである。言語研究で重要な要件となる,このような単位の斉一性や見出しの同一性への対処といったことを中心に,本電子化辞書の設計方針とそれを実装した辞書データベースシステムについて述べる。さらに,この設計の有用性を示すため,表記や語形の変異に関するコーパス分析の事例を紹介する。

抄録(英)

内容記述タイプ

Other

内容記述

In this paper, we describe the design and the implementation of an electronic dictionary for morphological analysis, UniDic, which aims particularly at application to Japanese corpus linguistics. It has been indispensable for the development of a large-scale corpus to utilize an automatic morphological analyzer on computer. The existing dictionaries for morphological analyzers, however, reveal lots of problems when used in corpus linguistics, such as unevenness in defining a unit and failure in handling allomorphs and orthographic variants. Our dictionary, in contrast, deals with the uniformity of units and the identity of indexes, which are important requirements for linguistic analysis of corpora. We adopt multi-level definition of word units, consisting of short-, middle-, and long-unit words, and structured representation of indexes, composed of lemma, word form, orthography, and pronunciation. We develop a database system that straight-forwardly implements this design of the dictionary and a friendly user-interface for dictionary builders to be capable of searching and registering entries with grasping the complex structure of the indexes. We also show how this structured representation benefits us in analyzing morphologically annotated corpora, presenting case studies that investigate the variation of word form in spoken language corpus and the variation of orthography in written language corpus.

出版者

国書刊行会

書誌情報

日本語科学
en : Japanese Linguistics

巻 22, p. 101-123, 発行日 2007-10-25

フォーマット

内容記述タイプ

Other

内容記述

application/pdf

著者版フラグ

出版タイプ

VoR

出版タイプResource

http://purl.org/coar/version/c_970fb48d4fbd8a85

戻る

views

See details

	Views

Versions

Ver.1

2023-05-15 15:09:36.863739

Show All versions

Cite as

エクスポート

OAI-PMH

JPCOAR
DublinCore
DDI

Other Formats

JSON
BIBTEX

インデックスリンク

インデックスツリー

アイテム

コーパス日本語学のための言語資源 : 形態素解析用電子化辞書の開発とその応用

× 伝, 康晴

× 小木曽, 智信

× 小椋, 秀樹

× 山田, 篤

× 峯松, 信明

× 内元, 清貴

× 小磯, 花絵

× DEN, Yasuharu

× OGISO, Toshinobu

× OGURA, Hideki

× YAMADA, Atsushi

× MINEMATSU, Nobuaki

× UCHIMOTO, Kiyotaka

× KOISO, Hanae

Versions

Share

Cite as

エクスポート