RapidXML 从文件中读取 - 这里出了什么问题?

发布于 2024-11-15 18:21:36 字数 2956 浏览 3 评论 0原文

这两种读取输入文件的方法有什么区别?

1) 使用 'ifstream.get()'

2) 使用 vectorifstreambuf_iterator (我不太理解!)

(除了使用漂亮的向量方法的明显答案之外)

输入文件是 XML,如下所示,立即解析为rapidxml 文档。 (在其他地方初始化,请参阅示例 main 函数。)

首先,让我向您展示两种编写“load_config”函数的方法,一种使用 ifstream.get() ,另一种使用 vector;

方法 1 ifstream.get() 提供工作代码和安全的rapidXML 文档对象:

rapidxml::xml_document<> *load_config(rapidxml::xml_document<> *doc){
   ifstream myfile("inputfile");

   //read in config file
   char ch;
   char buffer[65536];
   size_t chars_read = 0;

   while(myfile.get(ch) && (chars_read < 65535)){
      buffer[chars_read++] = ch;
   }
   buffer[chars_read++] = '\0';

   cout<<"clearing old doc"<<endl;
   doc->clear();

   doc->parse<0>(buffer);

   //debug returns as expected here
   cout << "load_config: Name of my first node is: " << doc->first_node()->name() << "\n";

   return doc;
}

方法2 会导致另一个库破坏rapidXML 文档 - 具体来说,调用curl_global_init(CURL_GLOBAL_SSL) [参见下面的主要代码] - 但我还没有将其归咎于curl_global_init。

rapidxml::xml_document<> *load_config(rapidxml::xml_document<> *doc){
   ifstream myfile("inputfile");

   vector<char> buffer((istreambuf_iterator<char>(inputfile)), 
                istreambuf_iterator<char>( ));
   buffer.push_back('\0');

   cout<<"file looks like:"<<endl;  //looks fine
   cout<<&buffer[0]<<endl;

   cout<<"clearing old doc"<<endl;
   doc->clear();

   doc->parse<0>(&buffer[0]);

   //debug prints as expected
   cout << "load_config: Name of my first node is: " << doc->first_node()->name() << "\n";

   return doc;
}

主要代码:

int main(void){
   rapidxml::xml_document *doc;
   doc = new rapidxml::xml_document;

   load_config(doc);

   // this works fine:
   cout << "Name of my first node is: " << doc->first_node()->name() << "\n"; 

   curl_global_init(CURL_GLOBAL_SSL);  //Docs say do this first.

   // debug broken object instance:
   // note a trashed 'doc' here if using vector<char> method 
   //  - seems to be because of above line... name is NULL 
   //    and other nodes are now NULL
   //    causing segfaults down stream.
   cout << "Name of my first node is: " << doc->first_node()->name() << "\n"; 

我非常确定这一切都是在单个线程中执行的,但也许发生了一些超出我理解的事情。

我还担心我只是解决了症状,而不是原因......通过简单地更改我的文件加载功能。在这里向社区寻求帮助!

问题:为什么从向量转向字符数组可以解决这个问题?

提示:我知道rapidXML 使用了一些巧妙的内存管理,实际上可以直接访问输入字符串。

提示:上面的 main 函数创建一个动态(新)xml_document。这不在原始代码中,并且是调试更改的产物。原始(失败)代码声明了它并且没有动态分配它,但出现了相同的问题。

完整披露的另一个提示(尽管我不明白为什么它很重要) - 在这堆代码中还有另一个向量实例,它由 rapidxml::xml_document 对象中的数据填充。

What's the difference between these two methods of reading an input file?

1) Using 'ifstream.get()'

and

2) Using a vector<char> with ifstreambuf_iterator<char> (less understood by me!)

(other than the obvious answer of having nifty vector methods to work with)

The input file is XML, and as you see below, immediately parsed into a rapidxml document. (initialized elsewhere, see example main function.)

First, let me show you two ways to write the 'load_config' function, one using ifstream.get() and one using vector<char>

Method 1 ifstream.get() provides working code, and a safe rapidXML document object:

rapidxml::xml_document<> *load_config(rapidxml::xml_document<> *doc){
   ifstream myfile("inputfile");

   //read in config file
   char ch;
   char buffer[65536];
   size_t chars_read = 0;

   while(myfile.get(ch) && (chars_read < 65535)){
      buffer[chars_read++] = ch;
   }
   buffer[chars_read++] = '\0';

   cout<<"clearing old doc"<<endl;
   doc->clear();

   doc->parse<0>(buffer);

   //debug returns as expected here
   cout << "load_config: Name of my first node is: " << doc->first_node()->name() << "\n";

   return doc;
}

Method 2 results in a cloberred rapidXML document by another library - specifically, a call to curl_global_init(CURL_GLOBAL_SSL) [see main code below] - but I'm not blaming it on curl_global_init just yet.

rapidxml::xml_document<> *load_config(rapidxml::xml_document<> *doc){
   ifstream myfile("inputfile");

   vector<char> buffer((istreambuf_iterator<char>(inputfile)), 
                istreambuf_iterator<char>( ));
   buffer.push_back('\0');

   cout<<"file looks like:"<<endl;  //looks fine
   cout<<&buffer[0]<<endl;

   cout<<"clearing old doc"<<endl;
   doc->clear();

   doc->parse<0>(&buffer[0]);

   //debug prints as expected
   cout << "load_config: Name of my first node is: " << doc->first_node()->name() << "\n";

   return doc;
}

main code:

int main(void){
   rapidxml::xml_document *doc;
   doc = new rapidxml::xml_document;

   load_config(doc);

   // this works fine:
   cout << "Name of my first node is: " << doc->first_node()->name() << "\n"; 

   curl_global_init(CURL_GLOBAL_SSL);  //Docs say do this first.

   // debug broken object instance:
   // note a trashed 'doc' here if using vector<char> method 
   //  - seems to be because of above line... name is NULL 
   //    and other nodes are now NULL
   //    causing segfaults down stream.
   cout << "Name of my first node is: " << doc->first_node()->name() << "\n"; 

I am pretty darn sure this is all executed in a single thread, but maybe there is something going on beyond my understanding.

I'm also worried that I only fixed a symptom, not a cause... by simply changing my file load function. Looking to the community for help here!

Question: Why would moving away from the vector to a character array fix this?

Hint: I'm aware that rapidXML uses some clever memory management that actually accesses the input string directly.

Hint: The main function above creates a dynamic (new) xml_document. This was not in the original code, and is an artifact of debugging changes. The original (failing) code declared it and did not dynamically allocate it, but identical problems occurred.

Another Hint for full disclosure (although I don't see why it matters) - there is another instance of a vector in this mess of code that is populated by the data in the rapidxml::xml_document object.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

似最初 2024-11-22 18:21:36

两者之间的唯一区别是,当文件长度超过 65535 个字符(它写入 \0 到第 65535 或 65536 个位置,这是越界的)。

两个版本共有的另一个问题是,您将文件读入的内存的生命周期比xml_document短。 阅读文档:

该字符串必须在文档的生命周期内持续存在。

load_config 退出时,vector 被销毁并释放内存。尝试访问文档会导致读取无效内存(未定义的行为)。

char 数组版本中,内存在堆栈上分配。当 load_config 存在时,它仍然处于“释放”状态(访问它会导致未定义的行为)。但您看不到崩溃,因为它尚未被覆盖。

The only difference between the two is that the vector version works correctly and the char array version causes undefined behavior when the file is longer than 65535 characters (it writes the \0 to the 65535th or 65536th position, which are out-of-bounds).

Another problem that is common to both versions, is that you read the file into a memory that has shorter life-time than the xml_document. Read the documentation:

The string must persist for the lifetime of the document.

When load_config exits the vector is destroyed and the memory is freed. Attempt to access the document cause reading invalid memory (undefined behavior).

In the char array version the memory is allocated on the stack. It is still 'freed' when load_config exists (accessing it causes undefined behavior). But you don't see the crash because it has not yet been overwritten.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文