使用 solr 进行自由文本(自然语言)查询解析
我正在尝试为本地搜索网站构建一个查询解析算法,该算法可以将自由文本搜索查询(单个输入文本框)分类为网站上可能的各种类型的可能搜索。
例如,用户可以输入 xyz 附近的中餐馆。我应该如何将其分解为 Cuisine:"chinese", locality:"xyz" 鉴于
- there could be spelling mistakes
- keywords may match in different columns e.g. a restaurant may have "chinese" in its name
这实际上并不是一个自然语言解析问题,因为我们试图在非常有限的一组可能性中进行搜索
我最初的想法是将特定类型的所有值转储到数据库中的字段中,并使用用户查询来匹配所有这些字段。然后根据分数(和预定义的置信度)将查询分为 3-4 个搜索字段,例如名称/美食/地点。
有没有更好/标准的方法来做到这一点。
I'm trying to build a query parsing algorithm for a local search site that can classify a free text search query (single input text box) into various type of possible searches possible on the site.
For e.g. the user could type chinese restaurants near xyz. How should I go about breaking it down to Cuisine:"chinese", locality:"xyz" given that
- there could be spelling mistakes
- keywords may match in different columns e.g. a restaurant may have "chinese" in its name
This is not really a natural language parsing problem since we're trying to search in a very limited set of posiibilities
My initial thoughts are to dump all values of a particular type into a field from the database and use the users query to match in all those fields. Then based on the score (and a predifined confidence level) divide the query into the 3-4 search fields like name/cuisine/locality.
Is there a better/standard way of doing this.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
关于拼写错误,你必须使用字典/同义词库。这可以是预处理和标准化的一部分。
关于多列查询,您可以这样做;美食:中文或餐厅名称:中文
您可以增强以下两者之一:美食:中文^ 0.8或餐厅名称:中文
About spelling mistakes, you have to work with a dictionary/thesaurus. This can be part of your pre-processing and normalization.
About querying in multiple columns you can do; cuisine:chinese OR restaurant_name:chinese
You can boost one of the two: cuisine:chinese^0.8 OR restaurant_name:chinese