提高读取文件时的空间复杂度

发布于 2024-11-27 17:39:57 字数 957 浏览 2 评论 0原文

我在文件中有一行任意长的整数(或浮点值),用逗号分隔:

1,2,3,4,5,6,7,8,2,3,4,5,6,7,8,9,3,...  (can go upto >100 MB)

现在,我必须读取这些值并将它们存储在数组中。

我当前的实现如下所示:

 float* read_line(int dimension)
   {
     float *values = new float[dimension*dimension]; // a line will have dimension^2 values
     std::string line;
     char *token = NULL, *buffer = NULL, *tmp = NULL;
     int count = 0;

     getline(file, line);
     buffer = new char[line.length() + 1];
     strcpy(buffer, line.c_str());
     for( token = strtok(buffer, ","); token != NULL; token = strtok(NULL, ","), count++ )
       {
         values[count] = strtod(token, &tmp);
       }
     delete buffer;
     return values;
   }

我不喜欢这个实现,因为:

  • 使用 ifstream 整个文件被加载到内存中,并且 然后被克隆成 float []
  • 存在不必要的重复(从 std::string 转换为 const char*

有哪些优化方法内存利用率?

谢谢!

I have an arbitrarily long line of integers (or floating point values) separated by commas in a file:

1,2,3,4,5,6,7,8,2,3,4,5,6,7,8,9,3,...  (can go upto >100 MB)

Now, I have to read these values and store them in an array.

My current implementation looks like this:

 float* read_line(int dimension)
   {
     float *values = new float[dimension*dimension]; // a line will have dimension^2 values
     std::string line;
     char *token = NULL, *buffer = NULL, *tmp = NULL;
     int count = 0;

     getline(file, line);
     buffer = new char[line.length() + 1];
     strcpy(buffer, line.c_str());
     for( token = strtok(buffer, ","); token != NULL; token = strtok(NULL, ","), count++ )
       {
         values[count] = strtod(token, &tmp);
       }
     delete buffer;
     return values;
   }

I don't like this implementation because:

  • Using ifstream the entire file is being loaded into the memory, and
    then being cloned into a float []
  • There is unnecessary duplication ( conversion from std::string to const char*)

What are ways to optimize memory utilization?

Thanks!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

秋风の叶未落 2024-12-04 17:39:57

像这样的东西吗?

float val;
while (file >> val)
{
  values[count++] = val;
  char comma;
  file >> comma; // skip comma
}

Something like this?

float val;
while (file >> val)
{
  values[count++] = val;
  char comma;
  file >> comma; // skip comma
}
梦晓ヶ微光ヅ倾城 2024-12-04 17:39:57

使用 boost tokenizeristreambuf_iterator

std::vector<float> test; //Optionally call reserve to avoid frequent memory reallocation
boost::tokenizer<boost::char_separator<char>, std::istreambuf_iterator<char> > tokens(std::istreambuf_iterator<char> (in), std::istreambuf_iterator<char>(), boost::char_separator<char>(","));
//Replace this lambda by your favourite conversion function.
std::transform(tokens.begin(), tokens.end(), std::back_inserter(test), [](std::basic_string<char> s) { return atof(s.c_str()); } );

编辑: test 是我用于 values 的内容,除了它是 std::vector 而不是数组,这通常更好 选择。

恕我直言,这段代码有一些优点。迭代器具有内置的 eof 处理,您可以非常轻松地扩展分隔符。它非常容易出错(特别是当您使用使用异常的 atof 替换时)。

Using boost tokenizer and istreambuf_iterator:

std::vector<float> test; //Optionally call reserve to avoid frequent memory reallocation
boost::tokenizer<boost::char_separator<char>, std::istreambuf_iterator<char> > tokens(std::istreambuf_iterator<char> (in), std::istreambuf_iterator<char>(), boost::char_separator<char>(","));
//Replace this lambda by your favourite conversion function.
std::transform(tokens.begin(), tokens.end(), std::back_inserter(test), [](std::basic_string<char> s) { return atof(s.c_str()); } );

edit: test is what I use for values, except it's a std::vector instead of arrays, which is usually the better choice.

Imho, this code has some advantages. The iterators have built-in eof handling, you can expand delimiters very easily. it's quite error-friendly (especially when you would use an atof replacement that uses exceptions).

灯角 2024-12-04 17:39:57

我想尝试基于 osgx 使用 scanf 的建议:

freopen("testcases.in", "r", stdin);
while( count < total_values)
       {
         scanf("%f,",&values[count]);
         count++;
       }

I wanted to try something based on osgx's suggestion of using scanf:

freopen("testcases.in", "r", stdin);
while( count < total_values)
       {
         scanf("%f,",&values[count]);
         count++;
       }
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文