在 Xapian 中使用术语前缀表示类别层次结构的最佳方式是什么?
假设我有以下示例层次结构:
- US
- 密歇根州
- 底特律
- 大急流城
- 兰辛
- 明尼苏达州
- 大急流城
- 明尼阿波利斯
- 圣保罗
- 俄亥俄州
- 哥伦布
- 大急流城
- 桑达斯基
- 密歇根州
我发现有两种方法可以用前缀术语对“密歇根州大急流城”文档进行索引:
XFIRSTLEVELus
XSECONDLEVELmichigan
XTHIRDLEVELgrandrapids
或者
XFIRSTLEVELus
XSECONDLEVELus_michigan
XTHIRDLEVELus_michigan_grandrapids
我倾向于使用第二种方法,认为它会返回更直观的结果。也就是说,包含密歇根州大急流城搜索条件的搜索不太可能包含来自明尼苏达州和俄亥俄州的文档。
然而,这种方法有两个方面令我困扰。首先,为层次结构的每个级别创建和维护术语前缀感觉是错误的。其次,值的串联似乎是使用权重的替代品。
那么,用术语前缀表示层次结构的最佳方式是什么?
Assume I have the following example hierarchy:
- US
- Michigan
- Detroit
- Grand Rapids
- Lansing
- Minnesota
- Grand Rapids
- Minneapolis
- St Paul
- Ohio
- Columbus
- Grand Rapids
- Sandusky
- Michigan
I see two ways that I could index a “Grand Rapids, Michigan” document with prefixed terms:
XFIRSTLEVELus
XSECONDLEVELmichigan
XTHIRDLEVELgrandrapids
or
XFIRSTLEVELus
XSECONDLEVELus_michigan
XTHIRDLEVELus_michigan_grandrapids
I’m inclined to use the second approach thinking that it will return more intuitive results. That is, a search that includes Grand Rapids, Michigan search criteria is less likely to include documents from Minnesota and Ohio.
However, two aspects of this approach bother me. First, the creation and maintenance of term prefixes for each level of the hierarchy feels wrong. Second, the concatenation of values seems like a surrogate for using weights.
So, what is the best way to represent a hierarchy with term prefixes?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
与所有这些事情一样,最好考虑如何使用数据,而不是存储数据的“最佳”方式是什么。
过去,我存储了像您描述的那样的位置数据,就好像它们是 URL 路径一样,将地名转换为 slug,因此上面的示例看起来像这样:
为每个文档提供带有这些路径之一的前缀术语,并且使用精确术语搜索仅获取某个位置 (
location:us/minnesota/minneapolis
) 的所有文档,或使用通配符搜索获取某个位置 (location:us/minnesota/ *
)这个可能是也可能不是“最佳”解决方案,但它可能适用于某些应用程序:)
As with all these things, It might be best to think about how you want to use the data, rather than what the 'best' way of storing it is.
In the past, I have stored location data like you describe as if they were URL paths, converting the place name in to a slug, so your example above would look something like:
Give each document a prefixed term with one of those paths, and use an exact term search to get all documents only in a place (
location:us/minnesota/minneapolis
) or a wildcard search to get all children of a location (location:us/minnesota/*
)This may or may not be the 'best' solution, but it might work for some applications :)