可按用户结果排序的可扩展全文搜索
有哪些选项可用于创建可扩展的全文搜索以及需要按用户排序的结果?这是针对 PHP/MySQL(也适用于 Symfony/Doctrine,如果相关的话)。
在我们的例子中,我们有一个用户进行过的锻炼的数据库。用户之前完成的锻炼应显示在结果的顶部。他们进行锻炼的频率越高,其在搜索匹配中的排名就应该越高。如果有帮助,您可以假设我们提前知道用户完成锻炼的次数。
可能的解决方案
Sphinx - 使用Sphinx实现全文搜索,在MySQL中完成所有查询和排序。这看起来很有希望(并且有一个 Symfony 插件!),但我对此了解不多。
Lucene - 使用 Lucene 执行全文搜索并将用户的完成内容放入查询中。正如中所建议的那样这个堆栈溢出线程。或者,使用 Lucene 检索结果,然后在 PHP 中重新排序。然而,这两种解决方案似乎都很笨重,并且可能无法扩展,因为用户可能已经完成了数百次锻炼。
Mysql - 没有本机全文支持 (InnoDB),因此我们必须使用不可扩展的 LIKE 或 REGEX。
What options exist for creating a scalable, full text search with results that need to be sorted on a per user basis? This is for PHP/MySQL (Symfony/Doctrine as well, if relevant).
In our case, we have a database of workouts that have been performed by users. The workouts that the user has done before should appear at the top of the results. The more frequently they've done the workout, the higher it should appear in search matches. If it helps, you can assume we know the number of times a user has done a workout in advance.
Possible Solutions
Sphinx - Use Sphinx to implement full text search, do all the querying and sorting in MySQL. This seems promising (and there's a Symfony Plugin!) but I don't know much about it.
Lucene - Use Lucene to perform full text search and put the users' completions into the query. As is suggested in this Stack Overflow thread. Alternatively, use Lucene to retrieve the results, then reorder them in PHP. However, both solutions seem clunky and potentially unscalable as a user may have completed hundreds of workouts.
Mysql - No native full text support (InnoDB), so we'd have use LIKE or REGEX, which isn't scalable.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
MySQL
确实具有本机FULLTEXT
支持,但仅在MyISAM
表中。对于大多数现实世界的任务,
Sphinx
是最快的引擎。但是,它是外部索引,因此只能通过 cron 脚本及时更新。通过使用
SphinxSE
(Sphinx
的可插入MySQL
接口),您可以连接MySQL
表和Sphinx
一个查询中的索引。不过,更新仍然需要外部脚本。由于执行的锻炼次数似乎经常变化,因此将其保留在
Sphinx
中需要花费太多精力来重建索引。使用
SphinxSE
,您可以编写类似于以下内容的查询:MySQL
does have a nativeFULLTEXT
support, though only inMyISAM
tables.For most real-world tasks,
Sphinx
is the fastest engine. However, it is an external index, so it can only be updated on a timely basis with a cron script.By using
SphinxSE
(a pluggableMySQL
interface toSphinx
), you can joinMySQL
tables andSphinx
indexes in one query. Updating, though, will still require an external script.Since the number of workouts performed seems to change frequently, keeping it in
Sphinx
would require too much effort on rebuilding the index.With
SphinxSE
, you can write a query similar to that:我不确定为什么您认为使用 Lucene 是不可扩展的。每个用户数百次锻炼并不是需要处理的大量数据。
尝试使用 Solr/Lucene 作为搜索后端。它有一个 JSON/XML 接口,可以与您的 PHP 前端很好地配合。将用户完成的锻炼# 存储在数据库表中。当发出查询时,从 Solr 获取结果,然后您可以从数据库表中进行选择并在 PHP 代码中进行选择。应该足够快并且可扩展。使用 Solr,维护索引非常简单;只需向您的 Solr 服务器发出添加/更新/删除请求。
I'm not sure why you're assuming using Lucene would be unscalable. Hundreds of workouts per user is not a lot of data to deal with.
Try using Solr/Lucene for the search backend. It has a JSON/XML interface which will play nicely with your PHP frontend. Store a user's completed workout # in a database table. When a query is issued, take the results from Solr, and you can select from the database table and resort in PHP code. Should be plenty fast and scalable. With Solr, maintaining your index is dirt simple; just issue add/update/delete requests to your Solr server.