提高读取文件时的空间复杂度
我在文件中有一行任意长的整数(或浮点值),用逗号分隔:
1,2,3,4,5,6,7,8,2,3,4,5,6,7,8,9,3,... (can go upto >100 MB)
现在,我必须读取这些值并将它们存储在数组中。
我当前的实现如下所示:
float* read_line(int dimension)
{
float *values = new float[dimension*dimension]; // a line will have dimension^2 values
std::string line;
char *token = NULL, *buffer = NULL, *tmp = NULL;
int count = 0;
getline(file, line);
buffer = new char[line.length() + 1];
strcpy(buffer, line.c_str());
for( token = strtok(buffer, ","); token != NULL; token = strtok(NULL, ","), count++ )
{
values[count] = strtod(token, &tmp);
}
delete buffer;
return values;
}
我不喜欢这个实现,因为:
- 使用
ifstream
整个文件被加载到内存中,并且 然后被克隆成float []
- 存在不必要的重复(从
std::string
转换为const char*
)
有哪些优化方法内存利用率?
谢谢!
I have an arbitrarily long line of integers (or floating point values) separated by commas in a file:
1,2,3,4,5,6,7,8,2,3,4,5,6,7,8,9,3,... (can go upto >100 MB)
Now, I have to read these values and store them in an array.
My current implementation looks like this:
float* read_line(int dimension)
{
float *values = new float[dimension*dimension]; // a line will have dimension^2 values
std::string line;
char *token = NULL, *buffer = NULL, *tmp = NULL;
int count = 0;
getline(file, line);
buffer = new char[line.length() + 1];
strcpy(buffer, line.c_str());
for( token = strtok(buffer, ","); token != NULL; token = strtok(NULL, ","), count++ )
{
values[count] = strtod(token, &tmp);
}
delete buffer;
return values;
}
I don't like this implementation because:
- Using
ifstream
the entire file is being loaded into the memory, and
then being cloned into afloat []
- There is unnecessary duplication ( conversion from
std::string
toconst char*
)
What are ways to optimize memory utilization?
Thanks!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
像这样的东西吗?
Something like this?
使用 boost tokenizer 和
istreambuf_iterator
:编辑:
test
是我用于values
的内容,除了它是std::vector
而不是数组,这通常更好 选择。恕我直言,这段代码有一些优点。迭代器具有内置的 eof 处理,您可以非常轻松地扩展分隔符。它非常容易出错(特别是当您使用使用异常的 atof 替换时)。
Using boost tokenizer and
istreambuf_iterator
:edit:
test
is what I use forvalues
, except it's astd::vector
instead of arrays, which is usually the better choice.Imho, this code has some advantages. The iterators have built-in eof handling, you can expand delimiters very easily. it's quite error-friendly (especially when you would use an atof replacement that uses exceptions).
我想尝试基于 osgx 使用 scanf 的建议:
I wanted to try something based on osgx's suggestion of using scanf: