RapidXML 从文件中读取 - 这里出了什么问题?
这两种读取输入文件的方法有什么区别?
1) 使用 'ifstream.get()'
和
2) 使用 vector
和 ifstreambuf_iterator
(我不太理解!)
(除了使用漂亮的向量方法的明显答案之外)
输入文件是 XML,如下所示,立即解析为rapidxml 文档。 (在其他地方初始化,请参阅示例 main 函数。)
首先,让我向您展示两种编写“load_config”函数的方法,一种使用 ifstream.get()
,另一种使用 vector
方法 1 ifstream.get()
提供工作代码和安全的rapidXML 文档对象:
rapidxml::xml_document<> *load_config(rapidxml::xml_document<> *doc){
ifstream myfile("inputfile");
//read in config file
char ch;
char buffer[65536];
size_t chars_read = 0;
while(myfile.get(ch) && (chars_read < 65535)){
buffer[chars_read++] = ch;
}
buffer[chars_read++] = '\0';
cout<<"clearing old doc"<<endl;
doc->clear();
doc->parse<0>(buffer);
//debug returns as expected here
cout << "load_config: Name of my first node is: " << doc->first_node()->name() << "\n";
return doc;
}
方法2 会导致另一个库破坏rapidXML 文档 - 具体来说,调用curl_global_init(CURL_GLOBAL_SSL) [参见下面的主要代码] - 但我还没有将其归咎于curl_global_init。
rapidxml::xml_document<> *load_config(rapidxml::xml_document<> *doc){
ifstream myfile("inputfile");
vector<char> buffer((istreambuf_iterator<char>(inputfile)),
istreambuf_iterator<char>( ));
buffer.push_back('\0');
cout<<"file looks like:"<<endl; //looks fine
cout<<&buffer[0]<<endl;
cout<<"clearing old doc"<<endl;
doc->clear();
doc->parse<0>(&buffer[0]);
//debug prints as expected
cout << "load_config: Name of my first node is: " << doc->first_node()->name() << "\n";
return doc;
}
主要代码:
int main(void){
rapidxml::xml_document *doc;
doc = new rapidxml::xml_document;
load_config(doc);
// this works fine:
cout << "Name of my first node is: " << doc->first_node()->name() << "\n";
curl_global_init(CURL_GLOBAL_SSL); //Docs say do this first.
// debug broken object instance:
// note a trashed 'doc' here if using vector<char> method
// - seems to be because of above line... name is NULL
// and other nodes are now NULL
// causing segfaults down stream.
cout << "Name of my first node is: " << doc->first_node()->name() << "\n";
我非常确定这一切都是在单个线程中执行的,但也许发生了一些超出我理解的事情。
我还担心我只是解决了症状,而不是原因......通过简单地更改我的文件加载功能。在这里向社区寻求帮助!
问题:为什么从向量转向字符数组可以解决这个问题?
提示:我知道rapidXML 使用了一些巧妙的内存管理,实际上可以直接访问输入字符串。
提示:上面的 main 函数创建一个动态(新)xml_document。这不在原始代码中,并且是调试更改的产物。原始(失败)代码声明了它并且没有动态分配它,但出现了相同的问题。
完整披露的另一个提示(尽管我不明白为什么它很重要) - 在这堆代码中还有另一个向量实例,它由 rapidxml::xml_document 对象中的数据填充。
What's the difference between these two methods of reading an input file?
1) Using 'ifstream.get()'
and
2) Using a vector<char>
with ifstreambuf_iterator<char>
(less understood by me!)
(other than the obvious answer of having nifty vector methods to work with)
The input file is XML, and as you see below, immediately parsed into a rapidxml document. (initialized elsewhere, see example main function.)
First, let me show you two ways to write the 'load_config' function, one using ifstream.get()
and one using vector<char>
Method 1 ifstream.get()
provides working code, and a safe rapidXML document object:
rapidxml::xml_document<> *load_config(rapidxml::xml_document<> *doc){
ifstream myfile("inputfile");
//read in config file
char ch;
char buffer[65536];
size_t chars_read = 0;
while(myfile.get(ch) && (chars_read < 65535)){
buffer[chars_read++] = ch;
}
buffer[chars_read++] = '\0';
cout<<"clearing old doc"<<endl;
doc->clear();
doc->parse<0>(buffer);
//debug returns as expected here
cout << "load_config: Name of my first node is: " << doc->first_node()->name() << "\n";
return doc;
}
Method 2 results in a cloberred rapidXML document by another library - specifically, a call to curl_global_init(CURL_GLOBAL_SSL) [see main code below] - but I'm not blaming it on curl_global_init just yet.
rapidxml::xml_document<> *load_config(rapidxml::xml_document<> *doc){
ifstream myfile("inputfile");
vector<char> buffer((istreambuf_iterator<char>(inputfile)),
istreambuf_iterator<char>( ));
buffer.push_back('\0');
cout<<"file looks like:"<<endl; //looks fine
cout<<&buffer[0]<<endl;
cout<<"clearing old doc"<<endl;
doc->clear();
doc->parse<0>(&buffer[0]);
//debug prints as expected
cout << "load_config: Name of my first node is: " << doc->first_node()->name() << "\n";
return doc;
}
main code:
int main(void){
rapidxml::xml_document *doc;
doc = new rapidxml::xml_document;
load_config(doc);
// this works fine:
cout << "Name of my first node is: " << doc->first_node()->name() << "\n";
curl_global_init(CURL_GLOBAL_SSL); //Docs say do this first.
// debug broken object instance:
// note a trashed 'doc' here if using vector<char> method
// - seems to be because of above line... name is NULL
// and other nodes are now NULL
// causing segfaults down stream.
cout << "Name of my first node is: " << doc->first_node()->name() << "\n";
I am pretty darn sure this is all executed in a single thread, but maybe there is something going on beyond my understanding.
I'm also worried that I only fixed a symptom, not a cause... by simply changing my file load function. Looking to the community for help here!
Question: Why would moving away from the vector to a character array fix this?
Hint: I'm aware that rapidXML uses some clever memory management that actually accesses the input string directly.
Hint: The main function above creates a dynamic (new) xml_document. This was not in the original code, and is an artifact of debugging changes. The original (failing) code declared it and did not dynamically allocate it, but identical problems occurred.
Another Hint for full disclosure (although I don't see why it matters) - there is another instance of a vector in this mess of code that is populated by the data in the rapidxml::xml_document object.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
两者之间的唯一区别是,当文件长度超过 65535 个字符(它写入
\0
到第 65535 或 65536 个位置,这是越界的)。两个版本共有的另一个问题是,您将文件读入的内存的生命周期比
xml_document
短。 阅读文档:当
load_config
退出时,vector
被销毁并释放内存。尝试访问文档会导致读取无效内存(未定义的行为)。在
char
数组版本中,内存在堆栈上分配。当load_config
存在时,它仍然处于“释放”状态(访问它会导致未定义的行为)。但您看不到崩溃,因为它尚未被覆盖。The only difference between the two is that the
vector
version works correctly and thechar
array version causes undefined behavior when the file is longer than 65535 characters (it writes the\0
to the 65535th or 65536th position, which are out-of-bounds).Another problem that is common to both versions, is that you read the file into a memory that has shorter life-time than the
xml_document
. Read the documentation:When
load_config
exits thevector
is destroyed and the memory is freed. Attempt to access the document cause reading invalid memory (undefined behavior).In the
char
array version the memory is allocated on the stack. It is still 'freed' whenload_config
exists (accessing it causes undefined behavior). But you don't see the crash because it has not yet been overwritten.