学習者コーパス研究における標本数の問題

石川, 慎一郎; イシカワ, シンイチロウ; ISHIKAWA, Shin'ichiro

WEKO3

lat lon distance

[[sub_check.contents]]

[[sub_radio.contents]]

Field does not validate

[[sub_attr.contents]]　

インデックスツリー

アイテム

学習者コーパス研究における標本数の問題

https://doi.org/10.15084/00001516

名前 / ファイル	ライセンス	アクション
LRW-2017-18-O-B-2.pdf (944.9 kB)

Item type

会議発表論文 / Conference Paper(1)

公開日

2018-03-20

タイトル

学習者コーパス研究における標本数の問題

タイトル

A Reconsideration of the Needed Sample Size in Learner Corpus Studies

言語

eng

キーワード

主題Scheme

Other

主題

多言語母語の日本語学習者横断コーパス(I-JAS)

キーワード

言語

主題Scheme

Other

主題

International Corpus of Japanese as a Second Language (I-JAS)

資源タイプ

資源タイプ識別子

http://purl.org/coar/resource_type/c_5794

資源タイプ

conference paper

ID登録

10.15084/00001516

ID登録タイプ

JaLC

著者

石川, 慎一郎

WEKO 4750
e-Rad 90320994

	石川, 慎一郎
ja-Kana	イシカワ, シンイチロウ
en	ISHIKAWA, Shin'ichiro

Search repository

著者所属

内容記述タイプ

Other

内容記述

神戸大学

会議概要（会議名, 開催地, 会期, 主催者等）

内容記述タイプ

Other

内容記述

会議名: 言語資源活用ワークショップ2017, 開催地: 国立国語研究所, 会期: 2017年9月5日-6日, 主催: 国立国語研究所コーパス開発センター

抄録

内容記述タイプ

Abstract

内容記述

大量のデータを集めやすい母語話者コーパスと異なり，学習者コーパスでは，集められる標本数に物理的な制約がある。ここで問題となるのは，調査対象とする言語項目ごとに，どの程度の標本数を集めればある程度安定した結果が得られるかということである。内外の主要な学習者コーパスは，学習者の国（母語）ごとにモジュール構成を取っているが，1モジュール当たりの標本数は，作文コーパスの場合，ICLE（英語・作文）で243-982，ICNALE（英語・作文）で200-800，日本語学習者作文コーパス（日本語・作文）で144-160また，発話コーパスの場合， LINDSEI（英語・インタビュー発話）で50-53，ICNALE（英語・独話）で200-600，I-JAS（日本語・インタビュー発話等）で50となっており，コーパスごとに大きな差がある。本論では，I-JASの母語話者および学習者データを用い，分析するサンプル数を変化させた場合の基本的言語指標値の変化を概観し，その収束のポイントを検討する。

抄録(英)

内容記述タイプ

Other

内容記述

The number of samples collected in learner corpora is generally small in comparison to native speaker corpora, ut the extent to which the limited sample size influences the reliability of learner corpus studies has not yet been holly elucidated. Therefore, we extracted short writing pieces from the International Corpus of Japanese as a econd Language (I-JAS) and prepared text sets of different sizes (n = 10, n = 20, n = 30, n = 40, and n = 50) for Chinese and Korean learners of Japanese as well as Japanese native speakers. We then examined the difference ratios observed across five kinds of text sets with a focus on basic linguistic indices, such as the total number of tokens per texts, and frequencies of punctuation marks, nouns and verbs, and conjugation forms of verbs. Our analyses show that the influence of sample size is not as strong as generally expected, and that discussion of learners’ L2 production with a relatively smaller corpus data could be rationalized to some extent.

書誌情報

言語資源活用ワークショップ発表論文集
en : Proceedings of Language Resources Workshop

巻 2, p. 154-163, 発行日 2017

Versions

Ver.1

2023-05-15 15:25:38.072199

Show All versions

Cite as

エクスポート

OAI-PMH

JPCOAR 2.0
JPCOAR 1.0
DublinCore
DDI

Other Formats

JSON
BIBTEX

インデックスリンク

インデックスツリー

アイテム

学習者コーパス研究における標本数の問題

× 石川, 慎一郎

Versions

Share

Cite as

エクスポート