使用 PHP 的 NLP 编程工具?
自从大型 Web 应用程序出现以来,搜索数据(并快速准确地搜索数据)一直是 Web 应用程序中最重要的问题之一。一段时间以来,我一直在使用 Lucene.NET,它是 Lucene 项目。
我还使用 Zend Framework 的 Lucene API 使用 PHP,这让我想到了我的问题。大多数时候,为了提供良好的索引,我们需要执行一些 NLP 工具,例如标记化、词形还原等等,问题是:
您知道有什么好的 NLP 编程框架/使用 PHP 的工具集?
PS:我非常了解 Lucene 的 Zend API,但是正确索引数据不仅仅是存储和依赖 Lucene,您还需要执行一些额外的任务,如上面的那些。
Since big web applications came into existence, searching for data (and doing it lightning fast and accurate) has been one of the most important problems in web applications. For a while, I've worked using Lucene.NET, which is a C# port of the Lucene project.
I also work using PHP using Zend Framework's Lucene API, which brings me to my question. Most times for providing good indexing we need to perform some NLP tools like tokenizing, lemmatizing, and many more, the question is:
Do you know of any good NLP programming framework/toolset using PHP?
PS: I'm very aware of the Zend API for Lucene, but indexing data properly is not just storing and relying in Lucene, you need to perform some extra tasks, like those above.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
我建议您查看 Solr,这是 Lucene 的最佳实践实现。 Solr 使用基于 REST 的 API,该 API 还具有非常好的 PHP 客户端。这将使您能够利用 Lucene 的强大功能,而无需执行任何低级编程来获得您想要的 NLP 功能。此外,您可能想要获取 Solr 的主干版本,因为 NLP 开发现在非常活跃,并且每天都在添加新功能。
I would suggest that you look at Solr, which is a best practice implementation of Lucene. Solr uses a REST based API that also has a very good PHP client. This will allow you to leverage the power of Lucene without needing to perform any of the low level programming to get the NLP power that you want. Also, you would probably want to grab the trunk version of Solr as the NLP development is very active right now and new capabilities are being added every day.
Zend 拥有 lucene 到 PHP 的完整移植。请参阅此处的文档。
Zend has a full port of lucene to PHP. See docs here.
似乎你正在寻找几个月前我在谷歌上搜索过的相同内容:D...我正在使用 Solr 运行一个基于 php/zend 的项目(通过 php-solr-client lib),到目前为止我还没有在其中找到任何内容php 用于高级 NLP。对于基本的东西,正如每个人都提到的,你可以摆脱 Solr(词干、标签云/短语标签云、标记化等),并且有一些基本但有用的文本处理 php 库(真的没什么花哨的,最好依靠Solr 本身)...但是如果您正在寻找更多算法/语义/情感 NLP 分析,我建议您从 PHP 转向 Java,因为有更多库可以在该领域为您提供帮助(例如 OpenNLP) )。
如果您正在寻找高级的东西,您可能想看看 Mahout:
http://www.lucidimagination.com/blog/2010/03/16/integrating-apache-mahout -with-apache-lucene-and-solr-part-i-of-3/
Seems like you are looking for the same stuff i googled a few months back :D... I'm running a php/zend based project with Solr (via php-solr-client lib), and so far I havent found anything in php for advanced NLP. For basic stuff, as everyone mentions, you can get away with Solr (stemming, tag clouds / phrase tag clouds, tokenizing, etc), and there are a few basic but useful text processing php libraries out there (nothing fancy really, better rely on Solr itself)... but if you are looking for more algorithmic/semantic/sentiment NLP analysis I suggest you move a bit from PHP and get into Java, as there are more libraries that can help you in this area(such as OpenNLP).
In case te adavanced stuff is what you are looking for, you probably might want to take a look at Mahout:
http://www.lucidimagination.com/blog/2010/03/16/integrating-apache-mahout-with-apache-lucene-and-solr-part-i-of-3/