We don’t allow questions seeking recommendations for software libraries, tutorials, tools, books, or other off-site resources. You can edit the question so it can be answered with facts and citations.
Closed 7 years ago.
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
接受
或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
发布评论
评论(3)
NLP 资源列表(词性标注器、NP 分块、序列模型、解析器...... .) 在 C++ 和其他语言中由 Christopher Manning 编写。维基百科中的另一个。
还有用于字符串和文本处理的 Boost 页面。
A list of resources for NLP (POS Taggers, NP chunking, Sequence models, Parsers...) in C++ and other languages by Christopher Manning. Another one in Wikipedia.
Also there's Boost page for String and text processing.
当然,这取决于您到底想做什么。
GATE 和 UIMA 都是 NLP 框架,主要围绕信息管理和提取的思想进行设计。说 GATE 比 UIMA 具有更多功能并不公平,因为严格来说它们都只是框架。然而,GATE 与 ANNIE 捆绑在一起,后者确实有很多不错的功能,可能对您有用(同样,取决于您想做什么)。 UIMA 与 OpenNLP 库捆绑在一起,这些库反映了其中一些(但不是全部)功能,但它是用 Java 编写的,因此需要加载 JVM。
您可以使用 C++ 库找到与 GATE/ANNIE 或 UIMA/OpenNLP 类似的功能,但这两个框架的优点在于它们是一致的,并且不需要大量“粘合代码”来使各个库相互通信。
不想将 GATE 包装在 C++ 代码中的原因是什么?我可以理解这会增加项目的复杂性,但如果您担心的是性能/内存,那么 JVM 可能是您最不用担心的。 NLP 工具往往非常消耗内存,预计会为 NER 模型放弃一半的内存,而为统计解析器放弃更多的内存。
Of course, it depends on what exactly what you want to do.
GATE and UIMA are both frameworks for NLP, mostly designed around the idea of information management and extraction. It's not really fair to say GATE has more features than UIMA, since strictly they are both only frameworks. However GATE is bundled with ANNIE which does have a lot of nice features which may be useful you (again, depending on what you want to do). UIMA is bundled with the OpenNLP libraries which mirror some, but not all, of these features, but are written in Java so would require loading the JVM.
You could find similar features to GATE/ANNIE or UIMA/OpenNLP using C++ libraries, but the nice thing about the two frameworks is that they are coherent and don't require a lot of 'glue code' to make individual libraries talk to each other.
What's the reason behind not wanting to wrap GATE in C++ code? I can appreciate that it would add to the complexity of the project, but if your worries are about performance/memory then the JVM may be the least of your worries. NLP tools tend to be very memory hungry, expect to give up half a gig for NER models, more for a statistical parser.
也许您想看看 NLP++,这是一种专为自然语言处理和文本分析量身定制的编程语言。
我建议从这里开始:
NLP++ 入门包
该包包含您需要的所有内容开始使用 NLP++。是的,你必须学习一种新的编程语言,但它类似于 C++,而且你不必使用黑盒 API。此外,VisualText 中编译的文本分析器会创建一个 Visual Studio 解决方案,您可以将其包含在其他 C++ 项目中。
您可以免费将 VisualText 和 NLP++ 用于非商业项目。
加入 NLP++ 社区提出问题、讨论您的分析器并了解有关 NLP++ 的更多信息:
NLP++ 社区
亲切的问候,
Dominik Holenstein
NLP++ 社区经理
Maybe you would like to take a look at NLP++, a programming language tailored for Natural Language Processing and Text Analytics.
I receommend to start here:
Getting Started Package for NLP++
This package contains everything you need to get started with NLP++. Yes, you have to learn a new programming language but it is similar to C++ and you don't have to use a black-box API. Further, a compiled text analyzer in VisualText creates a Visual Studio solution which you can include in your other C++ projects.
You can use the VisualText and NLP++ for free for non-commercial projects.
Join the NLP++ Community to ask questions, discuss your analyzers and to learn more about NLP++:
NLP++ Community
Kind regards,
Dominik Holenstein
NLP++ Community Manager