Sphinx、Rails、ThinkSphinx 以及在查询中使某些单词比其他单词更重要

发布于 2024-07-18 00:19:13 字数 301 浏览 4 评论 0原文

我有一个需要使用 ThinkingSphinx 进行搜索的关键字列表 其中一些比其他更重要,我需要找到一种方法来衡量这些词。

到目前为止,我想出的唯一解决方案是在查询中重复同一单词 x 次以增加其相关性。 例如:
3 个关键字,每个关键字都有一个重要性级别:蓝色 (1) 最近 (2) 有趣 (3) 我运行此查询

MyModel.search "BluecentrecentFunFunFun", :match_mode => :any

不是很优雅,而且相当有限。 有人有更好的主意吗?

I have a list of keywords that I need to search against, using ThinkingSphinx
Some of them being more important than others, i need to find a way to weight those words.

So far, the only solution i came up with is to repeat x number of times the same word in my query to increase its relevance.
Eg:
3 keywords, each of them having a level of importance: Blue(1) Recent(2) Fun(3)
I run this query

MyModel.search "Blue Recent Recent Fun Fun Fun", :match_mode => :any

Not very elegant, and quite limiting.
Does anyone have a better idea?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

赠我空喜 2024-07-25 00:19:13

如果您可以将这些关键字放入单独的字段中,那么您可以对这些字段赋予更重要的权重。 不过,这是我能想到的唯一好的方法。

MyModel.search "Blue Recent Fun", :field_weights => {"keywords" => 100}

If you can get those keywords into a separate field, then you could weight those fields to be more important. That's about the only good approach I can think of, though.

MyModel.search "Blue Recent Fun", :field_weights => {"keywords" => 100}
不知在何时 2024-07-25 00:19:13

最近我一直在广泛使用Sphinx,自从UltraSphinx消亡后,我开始使用Pat的很棒的插件(谢谢Pat,我很快就会在墨尔本请你喝杯咖啡!)

我看到了一个基于你最初想法的可能解决方案,但是您需要在“索引时间”而不是“运行时间”对数据进行更改。

试试这个:

  1. 修改您的 Sphinx SQL 查询,将“Blue”替换为“Blue Blue Blue Blue”,将“Recent”替换为“Recentcentrecent”,将“Fun”替换为“FunFun”。 这将放大出现的任何特殊关键字。

    例如 SELECT REPLACE(my_text_col,"blue","blue blue blue") as my_text_col ...

    您可能想一次完成所有这些操作,因此只需嵌套替换调用即可。

    例如 SELECT REPLACE(REPLACE(my_text_col,"fun","fun fun"),"blue","blue blue blue") as my_text_col ...

  2. 接下来,将排名模式更改为SPH_RANK_WORDCOUNT。 这样就可以为关键字的频率提供最大的相关性。

  3. (可选)假设您有一个与您的特殊关键字相关的关键字列表。 例如,“淡蓝色”与“蓝色”相关“愉快”与“有趣”相关。 在运行时,重写查询文本以查找目标单词。 您可以轻松地将这些单词存储在哈希中,然后循环遍历它以进行替换。

# Add trigger words as the key, 
# and the related special keyword as the value
trigger_words = {}
trigger_words['pale blue'] = 'blue'
trigger_words['pleasant'] = 'fun'

# Now loop through each query term and see if it should be replaced
new_query = ""
query.split.each do |word|
  word = trigger_words[word] if trigger_words.has_key?(word)
  new_query = new_query + ' ' word 
end

现在您也有了准关键字聚类。 Sphinx 确实是一项很棒的技术,享受吧!

Recently I've been using Sphinx extensively, and since the death of UltraSphinx, I started using Pat's great plugin (Thanks Pat, I'll buy you a coffee in Melbourne soon!)

I see a possible solution based on your original idea, but you need to make changes to the data at "index time" not "run time".

Try this:

  1. Modify your Sphinx SQL query to replace "Blue" with "Blue Blue Blue Blue", "Recent" with "Recent Recent Recent" and "Fun" with "Fun Fun". This will magnify any occurrences of your special keywords.

    e.g. SELECT REPLACE(my_text_col,"blue","blue blue blue") as my_text_col ...

    You probably want to do them all at once, so just nest the replace calls.

    e.g. SELECT REPLACE(REPLACE(my_text_col,"fun","fun fun"),"blue","blue blue blue") as my_text_col ...

  2. Next, change your ranking mode to SPH_RANK_WORDCOUNT. This way maximum relevancy is given to the frequency of the keywords.

  3. (Optional) Imagine you have a list of keywords related to your special keywords. For example "pale blue" relates to "blue" and "pleasant" relates to "fun". At run time, rewrite the query text to look for the target word instead. You can store these words easily in a hash, and then loop through it to make the replacements.

# Add trigger words as the key, 
# and the related special keyword as the value
trigger_words = {}
trigger_words['pale blue'] = 'blue'
trigger_words['pleasant'] = 'fun'

# Now loop through each query term and see if it should be replaced
new_query = ""
query.split.each do |word|
  word = trigger_words[word] if trigger_words.has_key?(word)
  new_query = new_query + ' ' word 
end

Now you have quasi-keyword-clustering too. Sphinx is really a fantastic technology, enjoy!

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文