Sphinx、Rails、ThinkSphinx 以及在查询中使某些单词比其他单词更重要
我有一个需要使用 ThinkingSphinx 进行搜索的关键字列表 其中一些比其他更重要,我需要找到一种方法来衡量这些词。
到目前为止,我想出的唯一解决方案是在查询中重复同一单词 x 次以增加其相关性。 例如:
3 个关键字,每个关键字都有一个重要性级别:蓝色 (1) 最近 (2) 有趣 (3) 我运行此查询
MyModel.search "BluecentrecentFunFunFun", :match_mode => :any
不是很优雅,而且相当有限。 有人有更好的主意吗?
I have a list of keywords that I need to search against, using ThinkingSphinx
Some of them being more important than others, i need to find a way to weight those words.
So far, the only solution i came up with is to repeat x number of times the same word in my query to increase its relevance.
Eg:
3 keywords, each of them having a level of importance: Blue(1) Recent(2) Fun(3)
I run this query
MyModel.search "Blue Recent Recent Fun Fun Fun", :match_mode => :any
Not very elegant, and quite limiting.
Does anyone have a better idea?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
如果您可以将这些关键字放入单独的字段中,那么您可以对这些字段赋予更重要的权重。 不过,这是我能想到的唯一好的方法。
If you can get those keywords into a separate field, then you could weight those fields to be more important. That's about the only good approach I can think of, though.
最近我一直在广泛使用Sphinx,自从UltraSphinx消亡后,我开始使用Pat的很棒的插件(谢谢Pat,我很快就会在墨尔本请你喝杯咖啡!)
我看到了一个基于你最初想法的可能解决方案,但是您需要在“索引时间”而不是“运行时间”对数据进行更改。
试试这个:
修改您的 Sphinx SQL 查询,将“Blue”替换为“Blue Blue Blue Blue”,将“Recent”替换为“Recentcentrecent”,将“Fun”替换为“FunFun”。 这将放大出现的任何特殊关键字。
例如 SELECT REPLACE(my_text_col,"blue","blue blue blue") as my_text_col ...
您可能想一次完成所有这些操作,因此只需嵌套替换调用即可。
例如 SELECT REPLACE(REPLACE(my_text_col,"fun","fun fun"),"blue","blue blue blue") as my_text_col ...
接下来,将排名模式更改为SPH_RANK_WORDCOUNT。 这样就可以为关键字的频率提供最大的相关性。
(可选)假设您有一个与您的特殊关键字相关的关键字列表。 例如,“淡蓝色”与“蓝色”相关,“愉快”与“有趣”相关。 在运行时,重写查询文本以查找目标单词。 您可以轻松地将这些单词存储在哈希中,然后循环遍历它以进行替换。
现在您也有了准关键字聚类。 Sphinx 确实是一项很棒的技术,享受吧!
Recently I've been using Sphinx extensively, and since the death of UltraSphinx, I started using Pat's great plugin (Thanks Pat, I'll buy you a coffee in Melbourne soon!)
I see a possible solution based on your original idea, but you need to make changes to the data at "index time" not "run time".
Try this:
Modify your Sphinx SQL query to replace "Blue" with "Blue Blue Blue Blue", "Recent" with "Recent Recent Recent" and "Fun" with "Fun Fun". This will magnify any occurrences of your special keywords.
e.g. SELECT REPLACE(my_text_col,"blue","blue blue blue") as my_text_col ...
You probably want to do them all at once, so just nest the replace calls.
e.g. SELECT REPLACE(REPLACE(my_text_col,"fun","fun fun"),"blue","blue blue blue") as my_text_col ...
Next, change your ranking mode to SPH_RANK_WORDCOUNT. This way maximum relevancy is given to the frequency of the keywords.
(Optional) Imagine you have a list of keywords related to your special keywords. For example "pale blue" relates to "blue" and "pleasant" relates to "fun". At run time, rewrite the query text to look for the target word instead. You can store these words easily in a hash, and then loop through it to make the replacements.
Now you have quasi-keyword-clustering too. Sphinx is really a fantastic technology, enjoy!