A corpus, in linguistics, is any coherent body of real-life(*) text or speech being studied. So yes, a book is a corpus. The fact that it's in one string doesn't matter, as long as you don't randomly shuffle the characters.
(*) As opposed to a bunch of made up phrases being shown to test subjects to measure their responses, as is commonly done in psycholinguistics.
发布评论
评论(3)
在语言学中,语料库是指正在研究的现实生活 (*) 文本或语音的任何连贯体。所以是的,一本书就是一个语料库。只要您不随机打乱字符,它在一个字符串中这一事实并不重要。
(*) 与心理语言学中常见的做法相反,显示一堆编造的短语来测试受试者以测量他们的反应。
A corpus, in linguistics, is any coherent body of real-life(*) text or speech being studied. So yes, a book is a corpus. The fact that it's in one string doesn't matter, as long as you don't randomly shuffle the characters.
(*) As opposed to a bunch of made up phrases being shown to test subjects to measure their responses, as is commonly done in psycholinguistics.
是的。
http://en.wikipedia.org/wiki/Text_corpus
具体来说,因为它用于统计。
Yes.
http://en.wikipedia.org/wiki/Text_corpus
Specifically, because it's uses for statistics.
通常“语料库”用于指结构化集合,但语言学家会知道你在说什么。
Usually "corpus" is used to refer to a structured collection, but linguists would know what you're talking about.