Lucene .Net-创建比键/值更复杂的索引的好方法是什么？

发布于 2024-11-27 04:14:35 字数 1488 浏览 0 评论 0原文

我正在启动一个项目，在该项目中我们尝试使用 Lucene .Net 来索引 XML 文档的内容。在小文档中，我发现索引似乎只能包含具有单个字符串值的文件。我尝试索引的数据比简单的键值对稍微复杂一些。

这是我想要从中生成索引的 xml 文档的示例：

    <descriptor>
  <asset guid="2AA7C8F9-2CB1-4A81-9421-C09F1D85939E" generated-date="2011-07-30" generated-by="hw/AutoMfg" generated-with="PMS">

    <!-- information about where the asset can be used -->
    <target>
      <localization>en-us</localization>
      <localization>es-us</localization>
      <environment>desktop</environment>
      <environment>mobile</environment>
    </target>

    <!-- all contents of an asset must have the same version -->
    <version-information>
      <version-number source="content">9.1.123.4</version-number>
      <version-number source="manufacturing">9.1.123.4</version-number>
      <release-label>9.1</release-label>
    </version-information>

    <!-- catalog information about the primary role of the asset -->
    <role>
      <namespace>parent.type.family.some.thing</namespace>
      <mime-type>text/html</mime-type>
      <hwid>abc1234</hwid>
    </role>

  </asset>
</descriptor>

因此，我可以看到创建以“descriptor”的子元素命名的字段，但是其中的子节点又如何呢？如何对这些数据建立索引？我应该创建一个分隔字符串来表示每个字段的值吗？

例如字段：“目标”值：“本地化：en-us；es-us 环境：桌面；移动 | ...

我是否需要像上面的示例一样展平数据以对其建立索引？

谢谢！

原文

I'm starting a project in which we are trying to index the contents of XML documents with Lucene .Net. In the little documentation I have found it seems that indexes can only consist of fileds with a single string value. The data that I am attempting to index is slightly more complicated than simple key value pairs.

Here is an example of an xml document I would want to generate an index from:

    <descriptor>
  <asset guid="2AA7C8F9-2CB1-4A81-9421-C09F1D85939E" generated-date="2011-07-30" generated-by="hw/AutoMfg" generated-with="PMS">

    <!-- information about where the asset can be used -->
    <target>
      <localization>en-us</localization>
      <localization>es-us</localization>
      <environment>desktop</environment>
      <environment>mobile</environment>
    </target>

    <!-- all contents of an asset must have the same version -->
    <version-information>
      <version-number source="content">9.1.123.4</version-number>
      <version-number source="manufacturing">9.1.123.4</version-number>
      <release-label>9.1</release-label>
    </version-information>

    <!-- catalog information about the primary role of the asset -->
    <role>
      <namespace>parent.type.family.some.thing</namespace>
      <mime-type>text/html</mime-type>
      <hwid>abc1234</hwid>
    </role>

  </asset>
</descriptor>

So I could see create fields named after the child elements of 'descriptor' but what about the child nodes there within? How can this data be indexed? Should I create a delimited string to represent the values of each fields?

eg
field: "Target" Value:"localization: en-us;es-us environment: desktop;mobile | ...

Do I need to flatten my data out like in my example above to index it?

Thanks!

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

蓝海似她心 2024-12-04 04:14:35

给出具体的建议有点棘手——其中大部分内容都围绕着你想要检索的内容以及如何检索，而不是数据的形状。无论如何，我会从 Simone Chiaretta 在 lucene.net 上的精彩小系列开始（1 2 3 4 5）。一个很有帮助的概念是，您可以为给定文档多次索引同一字段，因此您可能会做出类似的事情：

Target-Localization:en-us
Target-Localization:es-us
Target-Environment:desktop
Target-Environment:mobile

Lucene 基本上是扁平的，但能够以新的、有趣的方式变得深入而扁平。

Kind of tricky to give specific advice -- so much of it revolves around what you want to retrieve and how rather than the shape of the data. In any case, I would start with Simone Chiaretta's excellent little series on lucene.net (1 2 3 4 5). One concept that will help alot is the fact that you can index the same field multiple times for a given document, so you'll probably make something like:

Target-Localization:en-us
Target-Localization:es-us
Target-Environment:desktop
Target-Environment:mobile

Lucene is fundamentally flat, but capable of being deep while being flat in new and interesting ways.

回复收藏 0 原文

飘过的浮云 2024-12-04 04:14:35

查看 Digester + Lucene。 Digester 的 .NET 端口是 NDigester

回复收藏 0 原文

~没有更多了~

关于作者

梦里寻她

暂无简介

0 文章

0 评论

23 人气

关注发私信

友情链接

文江博客

Lucene .Net-创建比键/值更复杂的索引的好方法是什么？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

胡图图

zt006

z祗昰~

冰葑

野の

天空

友情链接

Lucene .Net-创建比键/值更复杂的索引的好方法是什么？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

胡图图

zt006

z祗昰~

冰葑

野の

天空

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。