在 Xapian 中使用术语前缀表示类别层次结构的最佳方式是什么?

发布于 2024-12-07 04:20:43 字数 675 浏览 3 评论 0原文

假设我有以下示例层次结构:

  • US
    • 密歇根州
      • 底特律
      • 大急流城
      • 兰辛
    • 明尼苏达州
      • 大急流城
      • 明尼阿波利斯
      • 圣保罗
    • 俄亥俄州
      • 哥伦布
      • 大急流城
      • 桑达斯基

我发现有两种方法可以用前缀术语对“密歇根州大急流城”文档进行索引:

XFIRSTLEVELus
XSECONDLEVELmichigan
XTHIRDLEVELgrandrapids

或者

XFIRSTLEVELus
XSECONDLEVELus_michigan
XTHIRDLEVELus_michigan_grandrapids

我倾向于使用第二种方法,认为它会返回更直观的结果。也就是说,包含密歇根州大急流城搜索条件的搜索不太可能包含来自明尼苏达州和俄亥俄州的文档。

然而,这种方法有两个方面令我困扰。首先,为层次结构的每个级别创建和维护术语前缀感觉是错误的。其次,值的串联似乎是使用权重的替代品。

那么,用术语前缀表示层次结构的最佳方式是什么?

Assume I have the following example hierarchy:

  • US
    • Michigan
      • Detroit
      • Grand Rapids
      • Lansing
    • Minnesota
      • Grand Rapids
      • Minneapolis
      • St Paul
    • Ohio
      • Columbus
      • Grand Rapids
      • Sandusky

I see two ways that I could index a “Grand Rapids, Michigan” document with prefixed terms:

XFIRSTLEVELus
XSECONDLEVELmichigan
XTHIRDLEVELgrandrapids

or

XFIRSTLEVELus
XSECONDLEVELus_michigan
XTHIRDLEVELus_michigan_grandrapids

I’m inclined to use the second approach thinking that it will return more intuitive results. That is, a search that includes Grand Rapids, Michigan search criteria is less likely to include documents from Minnesota and Ohio.

However, two aspects of this approach bother me. First, the creation and maintenance of term prefixes for each level of the hierarchy feels wrong. Second, the concatenation of values seems like a surrogate for using weights.

So, what is the best way to represent a hierarchy with term prefixes?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

秋凉 2024-12-14 04:20:43

与所有这些事情一样,最好考虑如何使用数据,而不是存储数据的“最佳”方式是什么。

过去,我存储了像您描述的那样的位置数据,就好像它们是 URL 路径一样,将地名转换为 slug,因此上面的示例看起来像这样:

us
us/michigan
us/michigan/detroit
us/michigan/grand-rapids
us/michigan/lansing
us/minnesota
us/minnesota/grand-rapids
us/minnesota/minneapolis
us/minnesota/st-paul
us/ohio
us/ohio/columbus
us/ohio/grand-rapids
us/ohio/sandusky

为每个文档提供带有这些路径之一的前缀术语,并且使用精确术语搜索仅获取某个位置 (location:us/minnesota/minneapolis) 的所有文档,或使用通配符搜索获取某个位置 (location:us/minnesota/ *)

这个可能是也可能不是“最佳”解决方案,但它可能适用于某些应用程序:)

As with all these things, It might be best to think about how you want to use the data, rather than what the 'best' way of storing it is.

In the past, I have stored location data like you describe as if they were URL paths, converting the place name in to a slug, so your example above would look something like:

us
us/michigan
us/michigan/detroit
us/michigan/grand-rapids
us/michigan/lansing
us/minnesota
us/minnesota/grand-rapids
us/minnesota/minneapolis
us/minnesota/st-paul
us/ohio
us/ohio/columbus
us/ohio/grand-rapids
us/ohio/sandusky

Give each document a prefixed term with one of those paths, and use an exact term search to get all documents only in a place (location:us/minnesota/minneapolis) or a wildcard search to get all children of a location (location:us/minnesota/*)

This may or may not be the 'best' solution, but it might work for some applications :)

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文