RapidXML 从文件中读取 - 这里出了什么问题？

发布于 2024-11-15 18:21:36 字数 2956 浏览 9 评论 0原文

这两种读取输入文件的方法有什么区别？

1) 使用 'ifstream.get()'

和

2) 使用 vector 和 ifstreambuf_iterator （我不太理解！）

（除了使用漂亮的向量方法的明显答案之外）

输入文件是 XML，如下所示，立即解析为rapidxml 文档。（在其他地方初始化，请参阅示例 main 函数。）

首先，让我向您展示两种编写“load_config”函数的方法，一种使用 ifstream.get() ，另一种使用 vector;

方法 1 ifstream.get() 提供工作代码和安全的rapidXML 文档对象：

rapidxml::xml_document<> *load_config(rapidxml::xml_document<> *doc){
   ifstream myfile("inputfile");

   //read in config file
   char ch;
   char buffer[65536];
   size_t chars_read = 0;

   while(myfile.get(ch) && (chars_read < 65535)){
      buffer[chars_read++] = ch;
   }
   buffer[chars_read++] = '\0';

   cout<<"clearing old doc"<<endl;
   doc->clear();

   doc->parse<0>(buffer);

   //debug returns as expected here
   cout << "load_config: Name of my first node is: " << doc->first_node()->name() << "\n";

   return doc;
}

方法2 会导致另一个库破坏rapidXML 文档 - 具体来说，调用curl_global_init(CURL_GLOBAL_SSL) [参见下面的主要代码] - 但我还没有将其归咎于curl_global_init。

rapidxml::xml_document<> *load_config(rapidxml::xml_document<> *doc){
   ifstream myfile("inputfile");

   vector<char> buffer((istreambuf_iterator<char>(inputfile)), 
                istreambuf_iterator<char>( ));
   buffer.push_back('\0');

   cout<<"file looks like:"<<endl;  //looks fine
   cout<<&buffer[0]<<endl;

   cout<<"clearing old doc"<<endl;
   doc->clear();

   doc->parse<0>(&buffer[0]);

   //debug prints as expected
   cout << "load_config: Name of my first node is: " << doc->first_node()->name() << "\n";

   return doc;
}

主要代码：

int main(void){
   rapidxml::xml_document *doc;
   doc = new rapidxml::xml_document;

   load_config(doc);

   // this works fine:
   cout << "Name of my first node is: " << doc->first_node()->name() << "\n"; 

   curl_global_init(CURL_GLOBAL_SSL);  //Docs say do this first.

   // debug broken object instance:
   // note a trashed 'doc' here if using vector<char> method 
   //  - seems to be because of above line... name is NULL 
   //    and other nodes are now NULL
   //    causing segfaults down stream.
   cout << "Name of my first node is: " << doc->first_node()->name() << "\n";

我非常确定这一切都是在单个线程中执行的，但也许发生了一些超出我理解的事情。

我还担心我只是解决了症状，而不是原因......通过简单地更改我的文件加载功能。在这里向社区寻求帮助！

问题：为什么从向量转向字符数组可以解决这个问题？

提示：我知道rapidXML 使用了一些巧妙的内存管理，实际上可以直接访问输入字符串。

提示：上面的 main 函数创建一个动态（新）xml_document。这不在原始代码中，并且是调试更改的产物。原始（失败）代码声明了它并且没有动态分配它，但出现了相同的问题。

完整披露的另一个提示（尽管我不明白为什么它很重要） - 在这堆代码中还有另一个向量实例，它由 rapidxml::xml_document 对象中的数据填充。

原文

What's the difference between these two methods of reading an input file?

1) Using 'ifstream.get()'

and

2) Using a vector<char> with ifstreambuf_iterator<char> (less understood by me!)

(other than the obvious answer of having nifty vector methods to work with)

The input file is XML, and as you see below, immediately parsed into a rapidxml document. (initialized elsewhere, see example main function.)

First, let me show you two ways to write the 'load_config' function, one using ifstream.get() and one using vector<char>

Method 1 ifstream.get() provides working code, and a safe rapidXML document object:

rapidxml::xml_document<> *load_config(rapidxml::xml_document<> *doc){
   ifstream myfile("inputfile");

   //read in config file
   char ch;
   char buffer[65536];
   size_t chars_read = 0;

   while(myfile.get(ch) && (chars_read < 65535)){
      buffer[chars_read++] = ch;
   }
   buffer[chars_read++] = '\0';

   cout<<"clearing old doc"<<endl;
   doc->clear();

   doc->parse<0>(buffer);

   //debug returns as expected here
   cout << "load_config: Name of my first node is: " << doc->first_node()->name() << "\n";

   return doc;
}

Method 2 results in a cloberred rapidXML document by another library - specifically, a call to curl_global_init(CURL_GLOBAL_SSL) [see main code below] - but I'm not blaming it on curl_global_init just yet.

rapidxml::xml_document<> *load_config(rapidxml::xml_document<> *doc){
   ifstream myfile("inputfile");

   vector<char> buffer((istreambuf_iterator<char>(inputfile)), 
                istreambuf_iterator<char>( ));
   buffer.push_back('\0');

   cout<<"file looks like:"<<endl;  //looks fine
   cout<<&buffer[0]<<endl;

   cout<<"clearing old doc"<<endl;
   doc->clear();

   doc->parse<0>(&buffer[0]);

   //debug prints as expected
   cout << "load_config: Name of my first node is: " << doc->first_node()->name() << "\n";

   return doc;
}

main code:

int main(void){
   rapidxml::xml_document *doc;
   doc = new rapidxml::xml_document;

   load_config(doc);

   // this works fine:
   cout << "Name of my first node is: " << doc->first_node()->name() << "\n"; 

   curl_global_init(CURL_GLOBAL_SSL);  //Docs say do this first.

   // debug broken object instance:
   // note a trashed 'doc' here if using vector<char> method 
   //  - seems to be because of above line... name is NULL 
   //    and other nodes are now NULL
   //    causing segfaults down stream.
   cout << "Name of my first node is: " << doc->first_node()->name() << "\n";

I am pretty darn sure this is all executed in a single thread, but maybe there is something going on beyond my understanding.

I'm also worried that I only fixed a symptom, not a cause... by simply changing my file load function. Looking to the community for help here!

Question: Why would moving away from the vector to a character array fix this?

Hint: I'm aware that rapidXML uses some clever memory management that actually accesses the input string directly.

Hint: The main function above creates a dynamic (new) xml_document. This was not in the original code, and is an artifact of debugging changes. The original (failing) code declared it and did not dynamically allocate it, but identical problems occurred.

Another Hint for full disclosure (although I don't see why it matters) - there is another instance of a vector in this mess of code that is populated by the data in the rapidxml::xml_document object.

分享到QQ

分享到微博