我不想解析 XML 中的某些标签

发布于 2024-09-16 04:13:03 字数 611 浏览 1 评论 0原文

目前,这将是我正在处理的示例 XML:

<smsq>
  <sms>
  <id>96</id>
  <to>03333560511</to>
  <msg>  danial says: hahaha <space> nothing.
  </msg>
  </sms>
</smsq>

现在请注意,该标记可以包含其他标记(不应解析),并且我必须为此创建一个 dtd。 dtd 是这样的:

<!DOCTYPE smsq [
  <!ELEMENT sms (mID,to,msg,type)>
  <!ELEMENT mID (#PCDATA)>
  <!ELEMENT to (#PCDATA)>
  <!ELEMENT msg (CDATA)>
]>

但问题是 XML 解析器仍然进入标记并表示该标记应该用标记关闭。我只想从 XML 中获取数据,并且不想进一步解析 msg。

请帮助我解决问题并告诉我是否可以使用 DTD 来完成此操作。

谢谢!

Currently this would be a sample XML that I am working on:

<smsq>
  <sms>
  <id>96</id>
  <to>03333560511</to>
  <msg>  danial says: hahaha <space> nothing.
  </msg>
  </sms>
</smsq>

Now please notice, that the tag can contain other tags (which should not be parsed) and I had to make a dtd for that. The dtd was something like this:

<!DOCTYPE smsq [
  <!ELEMENT sms (mID,to,msg,type)>
  <!ELEMENT mID (#PCDATA)>
  <!ELEMENT to (#PCDATA)>
  <!ELEMENT msg (CDATA)>
]>

But the problem is that XML parser still goes in the tag and says that the tag should be closed with a tag. I just want to fetch the data as it is from the XML and I do not want to parse msg further.

Please help me resolve the problem and tell me if this can be done with DTDs.

Thanks!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

谢绝鈎搭 2024-09-23 04:13:03

您无法创建一个 DTD,使有缺陷的 XML 奇迹般地不再有缺陷。 XML 的格式不正确,因此它永远不会有效,因为格式正确是有效性的先决条件(这里有效性并不重要,AFAICT)。这类似于英语句子中的单词必须全部是英语单词才能成为语法正确的英语句子。

未关闭。它应该在 内包含以下 ,替换为 > 或者如果通过说你不希望它被解析,你的意思是你想要实际的文本 "" 在那里,那么你应该对其进行编码(即 <空格>)。

You can't make a DTD that makes buggy XML magically not buggy. The XML is not well-formed, so it can never be valid as well-formedness is a prerequisite of validity (validity isn't even important here AFAICT). It's analogous to how the words in an English sentence have to all be English words before it can be a gramatically-correct English sentence.

<space> is not closed. It should either have a following </space> inside the <msg>, be replaced with <space/> or if by saying you don't want it to be paresed you mean you want the actual text "<space>" in there, then you should encode it as such (i.e. <space>).

月依秋水 2024-09-23 04:13:03

DTD 无法帮助您解决这个问题。 DTD 绝不是必需的(尽管拥有它非常方便)。

您上面发布的文档不是有效的 XML 文档。时期。事情就是这样,没有合理的 XML 解析器会在不引发错误的情况下为您解析它。

您可以做的是将 < 符号替换为 < XML 实体。

DTD can't help you with this problem. DTD is by no means required (though it is quite handy to have it).

The document you posted above is not a valid XML document. Period. That's the way it is, and no reasonable XML parser will parse it for you without raising the error.

What you can do though is to substitute < symbol with a < XML entity.

只为一人 2024-09-23 04:13:03

首先,示例 xml 并不是真正的 xml,因为“space”标签没有关闭。

其次,看起来不想解析“space”标签的原因是因为它不是真正的 xml - 只是看起来像 xml 的文本。文本应进行转义/编码或包含在 CDATA 标记中。

最后 - 如果您想要解析的确实是 xml 并且您只想解析第一级标签。我不会费心使用真正的 XML 解析器 - 我会创建自己的超简单解析器 - 它所要做的就是解析第一级节点 - 这应该不会太难。

祝你好运!

Firstly the sample xml is not really xml as the "space" tag is not closed.

Secondly, it looks like the reason for not wanting to parse the "space" tag is because it's not really xml - just text that looks like xml. The text should be either escaped/encoded or enclosed in CDATA tags.

Lastly - if what you want to parse really is xml and you only want to parse the first level tags. I wouldn't bother with a real XML parser - i'd create my own ultra-simple parser - all it has to do is parse 1st level nodes - that shouldn't be too hard.

Good luck!

划一舟意中人 2024-09-23 04:13:03

所有 XML 标记都必须闭合,例如

如果您希望将 标记解析为标记的文本值,而不是作为子标记,请使用 <> 而不是 <>

<space>

All XML tags have to be closed, either like <tag></tag> or <tag />.

If you want the <space> tag to be parsed as the text value of a tag, and not as a child tag, use < and > instead of < and >:

<space>
不弃不离 2024-09-23 04:13:03

我会将您问题的解决方案隔离到一个方法中,并暂时简单地处理它。毕竟,您可能无法控制消息内容的正确性。

private static String getMessage(String msg){
    return msg.substring(msg.indexOf("<msg>")+5, msg.lastIndexOf("</msg>"));
}//method

随着更多用例的出现,您可以稍后对其进行增强。

编辑:如果有人在内容中添加“msg”元素,那么它仍然有效

I would isolate the solution to your problem into a method and deal with it simply for now. After all, you may not have control over the correctness of the message content.

private static String getMessage(String msg){
    return msg.substring(msg.indexOf("<msg>")+5, msg.lastIndexOf("</msg>"));
}//method

You may enhance it later, as more use cases become available.

Edit: If someone puts an "msg" element in the content, then it still works

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文