将文档添加到 Sphinx 索引并修改其属性而无需完全重建
我有一个基于 Wordpress 的网站,它使用 Sphinx 作为搜索引擎,通常的 cron 作业通过访问网站的 MySQL 数据库每 N 小时重建 Sphinx 索引。这工作正常,除非帖子被创建或编辑 - 在这种情况下,直到重建索引时,它仍然保持未索引状态或使用过时的属性进行索引。
根据 Sphinx PHP API 文档 只允许更新已索引的文档,并且显然,如果不从头开始重建或与增量文档合并,就无法将新文档添加到索引中。也无法从索引中删除文档。
此外,查看 UpdateAttributes 源代码可以发现,只允许更新数字属性(其他类型通过断言过滤掉)。这让我认为 Sphinx 开发人员并不真正欢迎动态更新索引。
有没有什么方法可以解决这个问题,不仅可以按计划修改所有内容的索引,还可以根据需要修改特定文档的索引?或者,对于 Sphinx 来说这是一种不好的做法,即使只有一个文档需要更新,使用频繁更新的增量索引与主索引合并也是一种可以接受的解决方案?
提前致谢。
I have a Wordpress-based website which uses Sphinx as a search engine, with a usual cron job rebuilding Sphinx index every N hours by accessing site's MySQL database. This works fine except when the post is created or edited - in that case, until the time comes to rebuild the index, it remains unindexed or indexed with obsolete attributes.
According to the Sphinx PHP API documentation only updating of the already indexed documents is allowed and there is apparently no way of adding a new document into index without rebuilding it from scratch or merging with a delta one. There is no way to remove a document from the index as well.
Besides, peeking into UpdateAttributes source code reveals that only numeric attributes are allowed to update (other types are filtered out by assertion). This makes me think that updating index on the fly isn't really welcomed by Sphinx developers.
Are there any ways around to solve that problem and to modify index not only on schedule for everything but on demand for particular documents? Or it is a bad practice with Sphinx, and using a frequently updated delta index merged with the main one is an acceptable solution even if there is only a single document to update?
Thanks in advance.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您可以尝试Sphinx实时索引(http://sphinxsearch.com/docs/current.html #rt-overview)将单个文档添加到现有索引,而无需重建整个索引。
You can try Sphinx real time indexes (http://sphinxsearch.com/docs/current.html#rt-overview) to add one single document to an existing index without rebuilding the whole index.