将字符串与一组字符串进行比较的最有效算法
我有一个场景,用户可以通过表单字段发布多个响应或短语。我希望能够得到答复并确定他们的要求。例如,如果用户输入汽车、火车、自行车、喷气机......我可以假设他们正在谈论车辆,并做出相应的响应。我知道我也可以使用 switch 语句或正则表达式,但是可能的响应数量越大,计算效率就越低。我想知道是否有一种有效的算法来将字符串与一组字符串进行比较。任何信息都会很棒。
I have a scenario where a user can post a number of responses or phrases via a form field. I would like to be able to take the response and determine what they are asking for. For instance if the user types in car, train, bike, jet .... I can assume they are talking about a vehicle, and respond accordingly. I understand that I could use a switch statement or perhaps a regexp as well, however the larger the number of possible responses, the less efficient that computation will be. I'm wondering if there is an efficient algorithm for comparing a string with a group of strings. Any info would be great.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
您可能需要研究Aho-Corasick 算法< /强>。如果您有要搜索的字符串集合,则可以花费线性时间对这些字符串进行预处理,然后从那时起,可以在 O(n) 时间内检查文本语料库中这些字符串的所有可能匹配项长度 n.换句话说,只需花费很少的预处理时间即可设置一次算法,您就可以非常有效地一次又一次地扫描大量输入来搜索这些关键字。
有趣的是,该算法是专门为构建快速索引而发明的(即在大量文本中查找许多不同的关键字),据称其性能比其他方法高出十倍。我认为它在您的应用程序中会很有用。
希望这有帮助!
You may want to look into the Aho-Corasick algorithm. If you have a collection of strings that you want to search for, you can spend linear time doing preprocessing on those strings and from that point forward can, in O(n) time, check for all possible matches of those strings in a text corpus of length n. In other words, with a small preprocessing time to set up the algorithm once, you can extremely efficiently scan over numerous inputs again and again searching for those keywords.
Interestingly enough, the algorithm was specifically invented to build a fast index (that is, to look for a lot of different keywords in a huge body of text), and allegedly outperformed other methods by a factor of ten. I think it would work great in your application.
Hope this helps!
如果您有大量“神奇”单词,我建议将查询拆分为单词,并使用基于哈希的查找来检查这些单词是否被识别。
If you have a large number of "magic" words, I would suggest splitting the query into words, and using a hash-based lookup to check whether the words are recognized.
您可以检查 Trie 结构。我认为是解决您问题的最佳解决方案之一。
You can check Trie structure. I think one of best solution for your problem.