优化“搜索相似标签名流程”
我有一个包含大量数据的标签表。我需要制作一个自动完成文本框并搜索类似的标签,就像 stackoverflow 一样。我尝试过 mysql LIKE
但它很慢。我正在询问一种优化此任务的方法。
I have a tag table with a huge data. I need to make an autocomplete textbox and search similar tag just like stackoverflow. I'm tried mysql LIKE
but it's slow. I'm asking a way to optimize this task.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您可以查看全文索引,Lucene 或 狮身人面像。另外,由于您只在标签上执行此操作,因此您必须使用一些 n gram 分词器。
通常,您会为长内容或至少几个句子创建索引。大多数标记生成器使用空格和标点符号来分隔单词,在您的情况下,最好每 3 个字符分隔一次。例如,如果您有
主机
托管
Hosted
和用户输入主机,引擎将搜索 hos + t 并找到具有这两个组合的任何内容。
You can look in Full text indexing, Lucene or Sphinx. Also since you are doing that only on tags you would have to use some n gram tokenizer.
Usually you create an index on long content or at least couple of sentence worth. Most of the tokenizer use space and punctuation to separate words, in your case it would be better to for example separate every 3 characters. So for example if in your your have
host
hosting
hosted
and an user input host, the engine would search hos + t and find anything having this two combination.