如何生成无前缀自动完成建议?

发布于 2024-09-06 02:24:34 字数 301 浏览 6 评论 0原文

我想将自动完成功能添加到我的标记功能中。

有几个问题:

  1. 如何生成自动完成建议列表,其中包含字符串前缀和中间的匹配项?例如,如果用户键入“auto”,则自动完成建议应包括“自动完成”和“构建自动化”等术语。

  2. 我想允许多字标签并使用逗号(“,”)作为标签的分隔符。因此,当用户按下空格键时,他仍然会输入相同的标签,但当他按下逗号键时,他会开始一个新标签。我该怎么做?

我正在使用 Django、jQuery、MySQL 和 Solr。实现上述两个功能的最佳方法是什么?

I would like to add autocomplete to my tagging functionality.

A couple of questions:

  1. How do I generate a list of autocomplete suggestions that includes matches in both the prefix and the middle of string? For example, if the user type "auto", the autocomplete suggestions should include terms such as "autocomplete" and "build automation".

  2. I would like to allow multi-word tags and use comma (",") as a separator for tags. So when the use hits the space bar, he is still typing out the same tag, but when he hits the comma key, he's starting a new tag. How do I do that?

I am using Django, jQuery, MySQL, and Solr. What is the best way to achieve to implement the above 2 features?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

漫雪独思 2024-09-13 02:24:34

我已经完全实现了您所要求的内容,并且效果非常好。有两个问题需要注意:

  • 结果列表摘要中的突出显示不起作用,并且建议的解决方法在这种特殊情况下也不起作用。
  • 如果您的文档有很长的标题并在显示时将其截断,则您有可能会匹配未显示的单词的前缀。当然有几种方法可以处理这个问题。
  • 在未来的版本中,我想给标题开头的单词比结尾的单词更多的权重。这将是缓解前一项问题的一种方法。

与之前的答案一样,我将从上面链接的同一篇文章开始,但您确实需要 Edge NGram 分析器。您要添加的内容是还要进行空格标记化。

然后您将对 schema.xml 文件进行这些更改。此示例假设您已经定义了一个名为“title”的字段,并且它也是您想要显示的内容。我创建第二个字段,仅用于自动完成前缀匹配。

步骤 1:定义 Edge NGram 文本字段类型

<types>
  <!-- ... other types ... -->

  <!-- Assuming you already have this -->
  <fieldType name="text" class="solr.TextField" positionIncrementGap="100">
    ... normal text definition ...
  </fieldType>

  <!-- Adding this -->
  <fieldType name="prefix_edge_text" class="solr.TextField" positionIncrementGap="100">
    <analyzer type="index">
      <tokenizer class="solr.WhitespaceTokenizerFactory"/>
      <filter class="solr.LowerCaseFilterFactory"/>
      <!-- not using enablePositionIncrements="true" for now -->
      <filter class="solr.StopFilterFactory" words="stopwords.txt" />
      <filter class="solr.EdgeNGramFilterFactory" minGramSize="1" maxGramSize="25" />
    </analyzer>
    <analyzer type="query">
      <tokenizer class="solr.WhitespaceTokenizerFactory"/>
      <filter class="solr.LowerCaseFilterFactory"/>
      <!-- No need to create Edges here -->
      <!-- Don't want stopwords here -->
    </analyzer>
  </fieldType>

</types>

步骤 2:定义新字段

<fields>
  <!-- ... other fields ... -->

  <!-- Assuming you already have this -->
  <field name="title" type="text" indexed="true" stored="true" multiValued="true"/>

  <!-- Adding this -->
  <field name="prefix_title" type="prefix_edge_text" indexed="true" stored="true" multiValued="true" />

</fields>

步骤 3:在索引期间将标题的内容复制到前缀字段

<!-- Adding this -->
<copyField source="title" dest="prefix_title" />

这对于架构来说就差不多了。请记住:

  • 当您进行常规搜索时,您仍然会针对常规标题字段进行搜索。
  • 当您进行自动完成搜索时,请根据 prefix_title 进行搜索。

I've implemented exactly what you're asking about and it works really well. There's two issues to be aware of:

  • Highlighting in the results list summaries doesn't work, and the suggested workaround also doesn't work in this particular case.
  • If your documents have long titles and truncate them when displayed, there's a chance you'll be matching on the prefix of a word that's not being displayed. Several ways to handle this of course.
  • And in a future version, I'd like to give words towards the start of the title a bit more weight then words at the end. This would be one way to mitigate the previous item.

Like the previous answer, I'd start with the same article linked above, but you DO want the Edge NGram analyzer. The thing you'll add is to ALSO do whitespace tokenization.

And then you'd make these changes to your schema.xml file. This example assumes you already have a field called "title" defined, and it's what you'd like to display as well. I create a second field, which is ONLY used for autocomplete prefix matching.

Step 1: Define Edge NGram Text field type

<types>
  <!-- ... other types ... -->

  <!-- Assuming you already have this -->
  <fieldType name="text" class="solr.TextField" positionIncrementGap="100">
    ... normal text definition ...
  </fieldType>

  <!-- Adding this -->
  <fieldType name="prefix_edge_text" class="solr.TextField" positionIncrementGap="100">
    <analyzer type="index">
      <tokenizer class="solr.WhitespaceTokenizerFactory"/>
      <filter class="solr.LowerCaseFilterFactory"/>
      <!-- not using enablePositionIncrements="true" for now -->
      <filter class="solr.StopFilterFactory" words="stopwords.txt" />
      <filter class="solr.EdgeNGramFilterFactory" minGramSize="1" maxGramSize="25" />
    </analyzer>
    <analyzer type="query">
      <tokenizer class="solr.WhitespaceTokenizerFactory"/>
      <filter class="solr.LowerCaseFilterFactory"/>
      <!-- No need to create Edges here -->
      <!-- Don't want stopwords here -->
    </analyzer>
  </fieldType>

</types>

Step 2: Define the New Field

<fields>
  <!-- ... other fields ... -->

  <!-- Assuming you already have this -->
  <field name="title" type="text" indexed="true" stored="true" multiValued="true"/>

  <!-- Adding this -->
  <field name="prefix_title" type="prefix_edge_text" indexed="true" stored="true" multiValued="true" />

</fields>

Step 3: Copy the Title's content over to the prefix field during indexing

<!-- Adding this -->
<copyField source="title" dest="prefix_title" />

That's pretty much it for the schema. Just remember:

  • When you do a regular search, you still search against the regular title field.
  • When you're doing an autocomplete search, search against the prefix_title.
中二柚 2024-09-13 02:24:34
  1. 使用 NGramTokenizerFactory。使用分析控制台来查看其工作原理。另请参阅这篇文章< /a> (但您可以使用 NGram 而不是 EdgeNGram)。
  2. 不确定“标签”是什么意思,但我猜你有一个多值字段“标签”,因此你的代码会在将数据发送到 Solr 之前解析输入(用“,”分割)。
  1. Use the NGramTokenizerFactory. Use the analysis console to see how it works. Also see this article (but you would use NGram instead of EdgeNGram).
  2. Not sure what you mean by "tags" but I guess you have a multivalued field "tags", so your code would parse the input (splitting by ",") before sending the data to Solr.
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文