用于非英语术语提取的开源选项?
I am looking for a open source project that does term extraction with multiple languages.
I have already found Yahoo BOSS Term Extraction Web Service, and it is good. However, it does not handle languages other than English.
Are there any open source term extraction projects that support more languages?
Thanks!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
从我在生产中使用过或刚刚使用过的软件包来看,以下是最全面且维护最积极的软件包:
GATE - 用于广泛自然语言处理任务的计算机体系结构,可根据 GNU 公共许可证使用
Ling-Pipe (Java) - 一套用于人类语言语言分析的 Java 库,可以将实体提及与数据库条目链接起来,揭示关系,聚类文档,...
OpenNLP (Java) - Java 机器学习自然语言处理 (NLP) 工具包。它支持最常见的 NLP 任务。
NLTK (Python) - NLTK 是构建 Python 程序以处理人类语言数据的领先平台。
Proxem Antelope (.Net) - 高级自然语言面向对象处理环境
Scala-NLP (Scala)
斯坦福 NLP(Java)
此外,还有一些很好的 Web API,例如:
泽曼塔
Open-Calais
From the packages I've used in production or just played around with, the following were the most comprehensive and most actively maintained:
GATE - A computer architecture for a broad range of Natural Language Processing tasks, available under the GNU Public License
Ling-Pipe (Java) - A suite of Java libraries for the linguistic analysis of human language which can link entity mentions to database entries, uncover relations, cluster documents, ...
OpenNLP (Java) - Java machine learning toolkit for natural language processing (NLP). It supports the most common NLP tasks.
NLTK (Python) - NLTK is a leading platform for building Python programs to work with human language data.
Proxem Antelope (.Net) - Advanced Natural Language Object-oriented Processing Environement
Scala-NLP (Scala)
Stanford NLP (Java)
Also, there are some good web APIs, such as:
Zemanta
Open-Calais
GATE - 文本工程的通用架构:http://gate.ac.uk/
将进行术语提取、关键字排序和选择、情感分析,所有这些好东西。
开源、免费,来自英国。精通多种语言,包括阿拉伯语。
GATE - General Architecture for Text Engineering: http://gate.ac.uk/
Will do term extraction, keyword sorting and selection, sentiment analysis, all that good stuff.
Open-source, free, from the UK. Does a whole host of languages, including Arabic.
你可以尝试Linnaeus——它有点直接从科学论文中提取物种名称,但我认为你可以给它你自己的字典,并用于其他领域/任务。
You can try Linnaeus -- it is kind of directed to extract species names from scientific papers, but I think you can give it your own dictionaries, and use for other domains/tasks.