Zend lucene 和 MySql 数据库
我有一个 PHP 网站,数据存储在 MySql 数据库中。 (约 50 000 篇文章) 我想改进全文搜索功能的结果,并停止使用简单的 LIKE 查询。
我发现 Zend 框架中的 Zend_Search_Lucene 似乎是一个很棒的工具。
您认为 zend search lucene 对我来说是一个不错的选择吗?
用lucene索引我的所有文章后,我需要将数据保留在MySql中还是zend search lucene足以保留所有数据?
提前致谢,
I have a PHP web site with data stored in a MySql database. (approximately 50 000 articles)
I want to improve the results of the full text search functionality and stop using just a simple LIKE query.
I find Zend_Search_Lucene from the Zend framework that seems to be a great tool.
Do you think zend search lucene is a good choice in my case ?
After indexing all my articles with lucene, do I need to keep the data in MySql or zend search lucene is enough to keep all the data ?
Thanks in advance,
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
我会调查MySQL本机全文搜索是否会在转向基于 Lucene 的解决方案之前,首先满足您的需求。这是对使用
LIKE
语句的重大改进,无需 Lucene 所需的额外实现。Zend_Search_Lucene 是 Lucene 的纯 PHP 实现,因此在处理大型数据集时可能会非常慢。我会跳过它并考虑实现 Apache Solr。它有 PECL 扩展,记录在此处。
I would investigate if MySQLs native Full-Text Searching would meet your needs first before jumping to a Lucene based solution. It is a major improvement upon using
LIKE
statements without the additional implementation required for Lucene.Zend_Search_Lucene is a pure PHP implementation of Lucene and can therefore be pretty slow when used with large datasets. I would skip it and look at implementing Apache Solr. There is PECL extension for it, which is documented here.
我已经在超过 200,000 个具有大量数据的文档中使用了 MySQL 的全文,对于流行术语,我的搜索时间约为 0.5 秒到 2 秒,并且偶尔会出现非常罕见的 5 或 6 秒响应。我每天更新一些数据,因此长期缓存效果不佳,但如果我可以缓存搜索,我可以在缓存后查看 0.2 秒或更低。
我正在测试迁移到 Zend Lucene,到目前为止,对于最常用的术语,相同的搜索时间不到 1.5 秒。
以上所有内容均位于具有 2 GB 内存和 2 个 core 2 duo 的专用服务器上。
我不是专家,但对于 50,000 篇文章,我同意 Treffynnon 的观点,即使用全文搜索而不是使用 LIKE。如果您确实迁移到新版本的 Zend Lucene,我相信索引与 java 版本兼容,因此如果您将来添加更多文章并需要更快的速度,它可能会成为一个很好的网关?
I have used MySQL's fulltext on over 200,000 docs with a good amount of data and my search times are around .5 seconds to 2 seconds on popular terms and a very rare 5 or 6 second response every so often. I update some data each day so long term caching doesn't work the best but if I could cache searches I could be looking at .2 second times or lower after caching.
I am testing moving over to Zend Lucene and so far the same searches come in under 1.5 seconds for the most used terms.
All of the above is on a dedicated server with 2 gigs of ram and a core 2 duo.
I am no expert but for 50,000 articles I agree with Treffynnon to check out fulltext searching instead of using LIKE. If you do move to a new version of Zend Lucene I believe the indexes are compatible with the java version so it may make for a good gateway if down the road you add more articles and need more speed?