XML 文本处理

发布于 2024-11-30 18:14:39 字数 1663 浏览 2 评论 0原文

在我的复杂 XML 中已经像 String 一样存储，我必须进行一些文本/xml 处理。目标是找到某个给定节点的开始索引。除了节点（节点/元素类参考），我还有有关嵌套的信息：整数数组说明我应该进入哪个子节点。例如，对于数组：

2 1 0

给定的树，

root
  |--root-child0
  |--root-child1
  |--root-child2
       |--root-child2-child0
       |--root-child2-child1
                   |--root-child2-child1-child0

我正在搜索 root-child2-child1-child0

是否有任何干净的可能性来找到这样的项目。仅查找字符串 (String.indexOf()) 是不够的 - 在我的 XML 文件中有许多相同的标签。除了这种搜索之外，还有一个额外的困难 - 在某些父标签和子标签之间可能存在一个额外的（Collection）标签。（例如 root-child2 的唯一子级可能是Collection，而 root-child2-child0 和 root-child2-child1 是该Collection的子级）

--edit

如果可以帮助 - 除了提到的嵌套信息之外，我还可以拥有与我正在搜索的节点路径上的节点相对应的节点名称。

--edit 2

对于这样的 xml 文件，

<catalog>
    <book id="bk101">
        <author>Gambardella, Matthew</author>
        <title>XML Developer's Guide</title>
        <genre>Computer</genre>
    </book>
    <book id="bk102">
        <author>Ralls, Kim</author>
        <title>Midnight Rain</title>
        <genre>Fantasy</genre>
        <price>5.95</price>
    </book>
</catalog>

我们假设我有对值为 5.95 的价格标签的 Node 对象引用。另外我有关于嵌套的信息：

1  3

说它是目录中的第二本书（从0开始编号）和本书中的第四个标签（id =“bk102”）。

我想要的是得到类似“

xmlRawBody.indexOf("<price>5.95</price>").

为什么我不能使用这个简单的方法？” 的东西。因为相同的标签有可能出现在其他地方。我必须使用提到的附加嵌套信息。

原文

In my complex XML stored already just as String I have to make some text/xml-processing. The goal is to find the beginning index of some given node. Apart node (Node/Element class reference) I have also information about nesting: array of integers saying in which child should I step into. E.g. for array:

2 1 0

And given tree

root
  |--root-child0
  |--root-child1
  |--root-child2
       |--root-child2-child0
       |--root-child2-child1
                   |--root-child2-child1-child0

I am searching root-child2-child1-child0

Is there any clean possibility to find such item. Just string finding (String.indexOf()) isn't enough - in my XML file there are many identical tags. Besides this searching there is also one additional difficulty - between some parent and children tags can exist one additional (Collection) tag. (E.g. the only child of root-child2 could be Collection and root-child2-child0 and root-child2-child1 are children of this Collection)

--edit

If this could help - I could also, besides mentioned nesting information, have node names corresponding to nodes on path to node I'm searching.

--edit 2

With such xml file

<catalog>
    <book id="bk101">
        <author>Gambardella, Matthew</author>
        <title>XML Developer's Guide</title>
        <genre>Computer</genre>
    </book>
    <book id="bk102">
        <author>Ralls, Kim</author>
        <title>Midnight Rain</title>
        <genre>Fantasy</genre>
        <price>5.95</price>
    </book>
</catalog>

Let's assume I have Node object reference to price tag with value 5.95. Additionaly I have information about nesting:

1  3

saying that it is second (numeration from 0) book in catalog and forth tag in this book (id="bk102").

What I want is to get something like

xmlRawBody.indexOf("<price>5.95</price>").

Why can't I use this simple method? Because there is possibility that the same tag will be present in some other place. I have to use mentioned, additional nesting information.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

心欲静而疯不止 2024-12-07 18:14:39

我的理解是，你有类似 2 1 0 的信息。由此，您将创建节点 root-child2-child1-chile0 的名称。

要像这样获取指定的节点，如果结构是固定的，那么您可以从中创建一个 xpath。

例如，如果您的值为 2 1 0，则构建像 root/root-child2/root-child2-child1/root-child2-child1-child0 这样的 xpath。您可以使用此 xpath 来获取特定的节点元素。

您可以创建一个 java 函数，它根据给定的值数组相应地准备 xpath。如果你有更多这样的元素，你需要找到一种方法来用 xpath 来识别它们。因为 xpath 使用 SAXParser 可以轻松获取此类值（如果您不想读取整个 xml，而只想读取部分内容，则可以使用 SAXParser）。

希望这有帮助。