url 属性的rapidxml 解析错误

发布于 2024-11-02 13:45:02 字数 543 浏览 4 评论 0原文

在解析 xml 文件时,我遇到了 Rapidxml 的奇怪错误,例如

<?xml version="1.0" encoding="UTF-8"?>
<IMG align="left"
 src="http://www.w3.org/Icons/WWW/w3c_home" />

它抛出“expected >”。 我使用如下代码来解析数据,

std::fstream file("./test.xml");
std::istream_iterator<char> eos;
std::istream_iterator<char> iit (file);

std::vector<char> xml(iit, eos);
xml.push_back('\0');

xml_document<> doc;
doc.parse<0>(&xml[0]);

IMG 抹布中的“/”符号似乎不是问题。这是一个rapidxml错误还是我做错了什么?

I'm getting a strange error with rapidxml when parsing a xml file like

<?xml version="1.0" encoding="UTF-8"?>
<IMG align="left"
 src="http://www.w3.org/Icons/WWW/w3c_home" />

It throws "expected >".
Im using a code like the following to parse the data

std::fstream file("./test.xml");
std::istream_iterator<char> eos;
std::istream_iterator<char> iit (file);

std::vector<char> xml(iit, eos);
xml.push_back('\0');

xml_document<> doc;
doc.parse<0>(&xml[0]);

the "/" symbol in the IMG rag seems t be the problem. Is this a rapidxml bug or am I doing something wrong?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

嘿看小鸭子会跑 2024-11-09 13:45:02

将 XML 数据加载到向量中的方式是错误的。在 C++ 文本模式下,流默认设置有“skipws”标志,这会导致它们跳过输入中的所有空白。您可以通过检查向量的内容来验证这一点 - 它将丢失所有空格/结束线。这显然会导致解析器抱怨。

取消设置流上的skipws 标志以获得正确的行为:

file.unsetf(ios::skipws);

或者,您可以使用rapidxml_utils.hpp 中的文件类来加载文件:

using namespace rapidxml;
file<> file("test.xml");
xml_document<> doc;
doc.parse<0>(file.data());

遗憾的是,使用C++ 流加载文本文件非常棘手且充满陷阱。

至于上面的 sehe 测试,“错误接受”的情况是有意设计的(我没有足够的声誉来为他的答案添加评论)。您需要使用“parse_validate_ending_tags”解析标志来使解析器检查结束标记名称是否与起始标记名称匹配:

doc.parse<parse_validate_closing_tags>(...);

请参阅 parse_validate_ending_tags
此行为的基本原理是性能 - 验证结束标记非常耗时,并且在大多数情况下不需要。

The way you load the XML data into vector is wrong. In C++ text mode streams have "skipws" flag set by default, which causes them to skip all whitespace in the input. You can verify this by examining the contents of your vector - it will have all spaces/endlines missing. This obviously causes the parser to complain.

Unset skipws flag on the stream to get the correct behaviour:

file.unsetf(ios::skipws);

Alternatively, you can use file class from rapidxml_utils.hpp to load the file:

using namespace rapidxml;
file<> file("test.xml");
xml_document<> doc;
doc.parse<0>(file.data());

Sadly, loading text files with C++ streams is very tricky and full of traps.

As for sehe tests above, the "incorrectly accepted" cases are by design (I don't have enough reputation to add comments to his answer). You need to use "parse_validate_closing_tags" parse flag to make the parser check whether end tag name matches starting tag name:

doc.parse<parse_validate_closing_tags>(...);

See parse_validate_closing_tags in rapidxml manual.
The rationale for this behaviour is performance - verifying end tags is time consuming and in most cases not needed.

著墨染雨君画夕 2024-11-09 13:45:02

我只是出于好奇而尝试了一下。 RapidXml 可能很快,但它肯定不是很好

#include "rapidxml.hpp"

int main(int argc, char* args[])
{
        using namespace rapidxml;
        xml_document<> doc;    // character type defaults to char
        doc.parse<0>(args[1]);    // 0 means default parse flags

}

调用它会导致各种有趣的事情:

正确接受:

$ ./test.exe "<hello>world</hello>"

$ ./test.exe '<?xml version="1.0" encoding="UTF-8"?> <IMG align="left" src="http://www.w3.org/Icons/WWW/w3c_home" />'

正确拒绝

$ ./test.exe '<hello we="" / >'
terminate called after throwing an instance of 'rapidxml::parse_error'
  what():  expected >
Aborted (core dumped)

错误接受:

$ ./test.exe '<hello we="close">world</die><zellq></die>'

$ ./test.exe '<hello we="close/">world</die><we horrible=""></don'\''t>'

YMMV

I just tried it out of curiosity. RapidXml might be fast, but it sure isn't very good

#include "rapidxml.hpp"

int main(int argc, char* args[])
{
        using namespace rapidxml;
        xml_document<> doc;    // character type defaults to char
        doc.parse<0>(args[1]);    // 0 means default parse flags

}

Invoking it results in all kinds of funny business:

Correctly accepted:

$ ./test.exe "<hello>world</hello>"

$ ./test.exe '<?xml version="1.0" encoding="UTF-8"?> <IMG align="left" src="http://www.w3.org/Icons/WWW/w3c_home" />'

Correctly rejected

$ ./test.exe '<hello we="" / >'
terminate called after throwing an instance of 'rapidxml::parse_error'
  what():  expected >
Aborted (core dumped)

Incorrectly accepted:

$ ./test.exe '<hello we="close">world</die><zellq></die>'

$ ./test.exe '<hello we="close/">world</die><we horrible=""></don'\''t>'

YMMV

无力看清 2024-11-09 13:45:02

您的 XML 有效。如果代码和 XML 与您发布的完全相同,那么它一定是一个rapidxml bug。我猜它要么不支持在多行之间打破属性列表,要么不太可能不支持标签结尾的 />

Your XML is valid. If the code and the XML are exactly as you posted, it must be a rapidxml bug. I guess it either doesn't support breaking attribute list among multiple lines, or less likely, doesn't support /> for end of tag.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文