Because of the structure of your data a pre-trained model will probably perform poorly. Besides, the general organization, location, and person categories will probably not be useful for you.
I don't think the text themselves are too small, most NER-systems work on one sentence at a time. So providing your own training set with a NER-library will probably work well, such as http://nlp.stanford.edu/ner/index.shtml
If you don't want to create a training set you will need a dictionary with all the bands/artists. Then you obviously can't find unknown bands/artists.
有一个简单的 NER 算法可以稍微简化任务: 获取可能是(或不是)命名实体的单词,并在 Google 或 Yahoo(通过 API)中搜索它们两次:作为单独的单词和作为精确短语(即带引号)。除以结果数。存在确定单词是否形成命名实体的阈值(<30)。
There is simple NER algorithm that could simplify the task a bit: take the words which may be (or not be) named entity and search for them in Google or Yahoo (via API) twice: as separate words and as exact phrase (i.e. with quotation marks). Divide numbers of results. There is threshold (<30) which determines if words form a named entity.
发布评论
评论(2)
由于数据的结构,预先训练的模型可能表现不佳。此外,一般的组织、位置和人员类别可能对您没有用处。
我不认为文本本身太小,大多数 NER 系统一次只处理一个句子。因此,为您自己的训练集提供 NER 库可能会效果很好,例如 http://nlp .stanford.edu/ner/index.shtml
如果您不想创建训练集,您将需要一本包含所有乐队/艺术家的字典。那么你显然找不到不知名的乐队/艺术家。
Because of the structure of your data a pre-trained model will probably perform poorly. Besides, the general organization, location, and person categories will probably not be useful for you.
I don't think the text themselves are too small, most NER-systems work on one sentence at a time. So providing your own training set with a NER-library will probably work well, such as http://nlp.stanford.edu/ner/index.shtml
If you don't want to create a training set you will need a dictionary with all the bands/artists. Then you obviously can't find unknown bands/artists.
有一个简单的 NER 算法可以稍微简化任务:
获取可能是(或不是)命名实体的单词,并在 Google 或 Yahoo(通过 API)中搜索它们两次:作为单独的单词和作为精确短语(即带引号)。除以结果数。存在确定单词是否形成命名实体的阈值(<30)。
There is simple NER algorithm that could simplify the task a bit:
take the words which may be (or not be) named entity and search for them in Google or Yahoo (via API) twice: as separate words and as exact phrase (i.e. with quotation marks). Divide numbers of results. There is threshold (<30) which determines if words form a named entity.