解析 XML 文件时忽略元素之间的空格、回车符和制表符
我想使用 SAX Xerces C++ 解析 XML 文件,同时忽略不在元素属性内或开始和结束元素内的任何空格、回车符和制表符。我想忽略标签之间的空格、回车符和制表符。
例如,在以下 XML 文件中:
<tag1 attr1="val 1"><tag2>my text here</tag2>
[这里有许多空格]
我想保留字符串 'val 1'、'my text here' 中的空格,但忽略回车符,以及结尾 和结尾
之间的许多空白字符。
我尝试在 startElement()
中使用布尔标志“withinElement”设置为 true,并在 endElement()
方法中设置为 false,但这并不妨碍我忽略空格例如, 和
之间的字符。
应该在 characters()
方法中完成吗? 以及如何做到这一点,因为似乎没有办法知道调用 characters()
方法时我们到底在哪里?
I want to parse an XML file with SAX Xerces C++ while ignoring any white spaces, carriage return and tab characters that are NOT within element attributes or within a start and end element. I want to ignore white spaces, carriage returns and tabs that would be between tags.
For instance in following XML file:
<tag1 attr1="val 1"><tag2>my text here</tag2>
[many white spaces here] </tag1>
I want to preserve white spaces within the strings 'val 1', 'my text here', but ignore the carriage return, and the many whitespace characters between the ending </tag2>
and the ending </tag1>
.
I tried to use a boolean flag 'withinElement' set to true in startElement()
and set to false in endElement()
methods, but that does not prevent me to ignore whitespace characters between </tag2>
and </tag1>
for instance.
Should that be done in the characters()
method?
and how to do it as there does not seem to be a way to know where we are precisely when the characters()
method is invoked?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您可以要求解析器验证 XML 文件,然后您将通过
ignorableWhitespace
方法获取所有可忽略的空格,并通过characters
获取“好”空格。You could ask the parser to validate the XML file and then you will get all the ignorable whitespaces through the method
ignorableWhitespace
and the "good" whitespaces throughcharacters
.