解析 XML 文件时忽略元素之间的空格、回车符和制表符

发布于 2024-10-20 04:39:37 字数 610 浏览 11 评论 0原文

我想使用 SAX Xerces C++ 解析 XML 文件，同时忽略不在元素属性内或开始和结束元素内的任何空格、回车符和制表符。我想忽略标签之间的空格、回车符和制表符。

例如，在以下 XML 文件中：

<tag1 attr1="val 1"><tag2>my text here</tag2>

[这里有许多空格]

我想保留字符串 'val 1'、'my text here' 中的空格，但忽略回车符，以及结尾和结尾之间的许多空白字符。

我尝试在 startElement() 中使用布尔标志“withinElement”设置为 true，并在 endElement() 方法中设置为 false，但这并不妨碍我忽略空格例如，和之间的字符。

应该在 characters() 方法中完成吗？以及如何做到这一点，因为似乎没有办法知道调用 characters() 方法时我们到底在哪里？

原文

I want to parse an XML file with SAX Xerces C++ while ignoring any white spaces, carriage return and tab characters that are NOT within element attributes or within a start and end element. I want to ignore white spaces, carriage returns and tabs that would be between tags.

For instance in following XML file:

<tag1 attr1="val 1"><tag2>my text here</tag2>

[many white spaces here] </tag1>

I want to preserve white spaces within the strings 'val 1', 'my text here', but ignore the carriage return, and the many whitespace characters between the ending </tag2> and the ending </tag1>.

I tried to use a boolean flag 'withinElement' set to true in startElement() and set to false in endElement() methods, but that does not prevent me to ignore whitespace characters between </tag2> and </tag1> for instance.

Should that be done in the characters() method?
and how to do it as there does not seem to be a way to know where we are precisely when the characters() method is invoked?

分享到QQ

分享到微博