我认为 STL 导致我的应用程序内存使用量增加了三倍

发布于 2024-07-10 19:58:21 字数 2848 浏览 4 评论 0原文

我在我的应用程序中输入一个 200mb 的文件，由于一个非常奇怪的原因，我的应用程序的内存使用量超过 600mb。我尝试过向量和双端队列，以及 std::string 和 char * 但无济于事。我需要我的应用程序的内存使用情况与我正在阅读的文件几乎相同，任何建议都会非常有帮助。是否存在导致内存消耗如此之大的错误？您能指出问题所在还是我应该重写整个内容吗？

Windows Vista SP1 x64、Microsoft Visual Studio 2008 SP1、32 位发布版本、Intel CPU

到目前为止的整个应用程序：

#include <string>
#include <vector>
#include <iostream>
#include <iomanip>
#include <fstream>
#include <sstream>
#include <iterator>
#include <algorithm>
#include <time.h>



static unsigned int getFileSize (const char *filename)
{
    std::ifstream fs;
    fs.open (filename, std::ios::binary);
    fs.seekg(0, std::ios::beg);
    const std::ios::pos_type start_pos = fs.tellg();
    fs.seekg(0, std::ios::end);
    const std::ios::pos_type end_pos = fs.tellg();
    const unsigned int ret_filesize (static_cast<unsigned int>(end_pos - start_pos));
    fs.close();
    return ret_filesize;
}
void str2Vec (std::string &str, std::vector<std::string> &vec)
{
    int newlineLastIndex(0);
    for (int loopVar01 = str.size(); loopVar01 > 0; loopVar01--)
    {
        if (str[loopVar01]=='\n')
        {
            newlineLastIndex = loopVar01;
            break;
        }
    }
    int remainder(str.size()-newlineLastIndex);

    std::vector<int> indexVec;
    indexVec.push_back(0);
    for (unsigned int lpVar02 = 0; lpVar02 < (str.size()-remainder); lpVar02++)
    {
        if (str[lpVar02] == '\n')
        {
            indexVec.push_back(lpVar02);
        }
    }
    int memSize(0);
    for (int lpVar03 = 0; lpVar03 < (indexVec.size()-1); lpVar03++)
    {
        memSize = indexVec[(lpVar03+1)] - indexVec[lpVar03];
        std::string tempStr (memSize,'0');
        memcpy(&tempStr[0],&str[indexVec[lpVar03]],memSize);
        vec.push_back(tempStr);
    }
}
void readFile(const std::string &fileName, std::vector<std::string> &vec)
{
    static unsigned int fileSize = getFileSize(fileName.c_str());
    static std::ifstream fileStream;
    fileStream.open (fileName.c_str(),std::ios::binary);
    fileStream.clear();
    fileStream.seekg (0, std::ios::beg);
    const int chunks(1000); 
    int singleChunk(fileSize/chunks);
    int remainder = fileSize - (singleChunk * chunks);
    std::string fileStr (singleChunk, '0');
    int fileIndex(0);
    for (int lpVar01 = 0; lpVar01 < chunks; lpVar01++)
    {
        fileStream.read(&fileStr[0], singleChunk);
        str2Vec(fileStr, vec);
    }
    std::string remainderStr(remainder, '0');
    fileStream.read(&remainderStr[0], remainder);
    str2Vec(fileStr, vec);      
}
int main (int argc, char *argv[])
{   
        std::vector<std::string> vec;
        std::string inFile(argv[1]);
        readFile(inFile, vec);
}

原文

I am inputting a 200mb file in my application and due to a very strange reason the memory usage of my application is more than 600mb. I have tried vector and deque, as well as std::string and char * with no avail. I need the memory usage of my application to be almost the same as the file I am reading, any suggestions would be extremely helpful.
Is there a bug that causes so much memory consumption? Could you pinpoint the problem or should I rewrite the whole thing?

Windows Vista SP1 x64, Microsoft Visual Studio 2008 SP1, 32Bit Release Version, Intel CPU

The whole application until now:

#include <string>
#include <vector>
#include <iostream>
#include <iomanip>
#include <fstream>
#include <sstream>
#include <iterator>
#include <algorithm>
#include <time.h>



static unsigned int getFileSize (const char *filename)
{
    std::ifstream fs;
    fs.open (filename, std::ios::binary);
    fs.seekg(0, std::ios::beg);
    const std::ios::pos_type start_pos = fs.tellg();
    fs.seekg(0, std::ios::end);
    const std::ios::pos_type end_pos = fs.tellg();
    const unsigned int ret_filesize (static_cast<unsigned int>(end_pos - start_pos));
    fs.close();
    return ret_filesize;
}
void str2Vec (std::string &str, std::vector<std::string> &vec)
{
    int newlineLastIndex(0);
    for (int loopVar01 = str.size(); loopVar01 > 0; loopVar01--)
    {
        if (str[loopVar01]=='\n')
        {
            newlineLastIndex = loopVar01;
            break;
        }
    }
    int remainder(str.size()-newlineLastIndex);

    std::vector<int> indexVec;
    indexVec.push_back(0);
    for (unsigned int lpVar02 = 0; lpVar02 < (str.size()-remainder); lpVar02++)
    {
        if (str[lpVar02] == '\n')
        {
            indexVec.push_back(lpVar02);
        }
    }
    int memSize(0);
    for (int lpVar03 = 0; lpVar03 < (indexVec.size()-1); lpVar03++)
    {
        memSize = indexVec[(lpVar03+1)] - indexVec[lpVar03];
        std::string tempStr (memSize,'0');
        memcpy(&tempStr[0],&str[indexVec[lpVar03]],memSize);
        vec.push_back(tempStr);
    }
}
void readFile(const std::string &fileName, std::vector<std::string> &vec)
{
    static unsigned int fileSize = getFileSize(fileName.c_str());
    static std::ifstream fileStream;
    fileStream.open (fileName.c_str(),std::ios::binary);
    fileStream.clear();
    fileStream.seekg (0, std::ios::beg);
    const int chunks(1000); 
    int singleChunk(fileSize/chunks);
    int remainder = fileSize - (singleChunk * chunks);
    std::string fileStr (singleChunk, '0');
    int fileIndex(0);
    for (int lpVar01 = 0; lpVar01 < chunks; lpVar01++)
    {
        fileStream.read(&fileStr[0], singleChunk);
        str2Vec(fileStr, vec);
    }
    std::string remainderStr(remainder, '0');
    fileStream.read(&remainderStr[0], remainder);
    str2Vec(fileStr, vec);      
}
int main (int argc, char *argv[])
{   
        std::vector<std::string> vec;
        std::string inFile(argv[1]);
        readFile(inFile, vec);
}

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

平生欢 2024-07-17 19:58:21

你的记忆正在支离破碎。

尝试这样的事情：

  HANDLE heaps[1025];
  DWORD nheaps = GetProcessHeaps((sizeof(heaps) / sizeof(HANDLE)) - 1, heaps);

  for (DWORD i = 0; i < nheaps; ++i) 
  {
    ULONG  HeapFragValue = 2;
    HeapSetInformation(heaps[i],
                       HeapCompatibilityInformation,
                       &HeapFragValue,
                       sizeof(HeapFragValue));
  }

Your memory is being fragmented.

Try something like this :

  HANDLE heaps[1025];
  DWORD nheaps = GetProcessHeaps((sizeof(heaps) / sizeof(HANDLE)) - 1, heaps);

  for (DWORD i = 0; i < nheaps; ++i) 
  {
    ULONG  HeapFragValue = 2;
    HeapSetInformation(heaps[i],
                       HeapCompatibilityInformation,
                       &HeapFragValue,
                       sizeof(HeapFragValue));
  }

回复收藏 0 原文

鹤舞 2024-07-17 19:58:21

如果我没看错的话，最大的问题是这个算法会自动将所需的内存加倍。

在 ReadFile() 中，您将整个文件读入一组“singleChunk”大小的字符串（块），然后在 str2Vec() 的最后一个循环中，为该块的每个换行符分隔的段分配一个临时字符串。所以你要把内存加倍。

您还遇到速度问题 - str2vec 对块进行 2 次传递以找到所有换行符。没有理由你不能一次做到这一点。

回复收藏 0 原文

小嗷兮 2024-07-17 19:58:21

您可以做的另一件事是将整个文件加载到一个内存块中。然后创建一个指向每行第一个字符的指针向量，同时用 \0 替换换行符，使其以空终止。（当然假设你的字符串中不应该有 \0 。）

它不一定像拥有字符串向量那么方便，但拥有 const char* 向量可能“同样好”。

回复收藏 0 原文

耳根太软 2024-07-17 19:58:21

STL 容器的存在是为了抽象出内存操作。如果你有严格的内存限制，那么你就无法真正将它们抽象出来。

我建议使用 mmap() 读取文件（或者，在 Windows 中，使用 MapViewOfFile()）。

回复收藏 0 原文

帅的被狗咬 2024-07-17 19:58:21

在 readFile 中，您至少有 2 个文件副本 - ifstream 和复制到 std::vector 中的数据。只要您打开文件，并且按照原样复制它，就很难将总内存占用量降至文件大小的两倍以下。

回复收藏 0 原文

ζ澈沫 2024-07-17 19:58:21

首先，您如何确定内存使用情况？任务管理器不是一个合适的工具，因为它显示的并不是实际的内存使用情况。

其次，除了（出于某种原因？）静态变量之外，当您读完文件后唯一没有释放的数据是向量。所以测试一下它的容量，测试一下它包含的每个字符串的容量。找出他们各自使用了多少内存。您拥有确定内存消耗位置的工具。

回复收藏 0 原文

徒留西风 2024-07-17 19:58:21

我认为您尝试编写自己的缓冲策略是错误的。

流已经实现了非常好的缓冲策略。如果您认为需要更大的缓冲区，您可以将基本缓冲区安装到流中，而无需任何额外的代码来控制缓冲区。

这是我想出的：
注意：使用我在网上找到的文本版《英王詹姆斯圣经》进行了测试。

#include <string>
#include <vector>
#include <list>
#include <fstream>
#include <algorithm>
#include <iterator>
#include <iostream>

class Line: public std::string
{
};

std::istream& operator>>(std::istream& in,Line& line)
{
    // Relatively efficient way to copy a line into a string.
    return std::getline(in,line);
}
std::ostream& operator<<(std::ostream& out,Line const& line)
{
    return out << static_cast<std::string const&>(line) << "\n";
}

void readLinesFromStream(std::istream& stream,std::vector<Line>& lines)
{
    /*
     * Read into a list as this is flexible in memory usage and will not
     * allocate huge chunks of un-required space.
     *
     * Even with huge files the space for list will be insignificant
     * compared to the size of the data.
     *
     * This then allows us to reserve the correct size of the vector
     * Thus avoiding huge memory chunks being prematurely allocated that
     * are not required. It also prevents the internal structure from
     * being copied every time the container is re-sized.
     */
    std::list<Line>     data;
    std::copy(  std::istream_iterator<Line>(stream),
                std::istream_iterator<Line>(),
                std::inserter(data,data.end())
             );

    /*
     * Reserve the correct size in the vector.
     * then copy out of the list into the vector
     */
    lines.reserve(data.size());
    std::copy(  data.begin(),
                data.end(),
                std::back_inserter(lines)
             );
}

void readLinesFromFile(std::string const& name,std::vector<Line>& lines)
{
    /*
     * Set up the file stream and override the default buffer used by the stream.
     * Make it big because we think the istream buffer is insufficient!!!!
     */
    std::ifstream       file;
    std::vector<char>   buffer(10000);
    file.rdbuf()->pubsetbuf(&buffer[0],buffer.size());

    file.open(name.c_str());
    readLinesFromStream(file,lines);
}


int main(int argc,char* argv[])
{
    std::vector<Line>   lines;
    readLinesFromFile(argv[1],lines);

    // Un-comment if your file is larger than 1100 lines.

    // I tested with a copy of the King James bible. 
    // std::cout << "Lines: " << lines.size() << "\n";
    // std::copy(lines.begin() + 1000,lines.begin() + 1100,std::ostream_iterator<Line>(std::cout));
}

I think your attempt to write your own buffering strategy is misguided.

The streams have a very good buffering strategy already implemented. If you think you need a larger buffer you can install a basic buffer into the stream without any extra code to control the buffer.

Here is what I came up with:
NB tested with a text version of the "King James Bible" that I found online.

#include <string>
#include <vector>
#include <list>
#include <fstream>
#include <algorithm>
#include <iterator>
#include <iostream>

class Line: public std::string
{
};

std::istream& operator>>(std::istream& in,Line& line)
{
    // Relatively efficient way to copy a line into a string.
    return std::getline(in,line);
}
std::ostream& operator<<(std::ostream& out,Line const& line)
{
    return out << static_cast<std::string const&>(line) << "\n";
}

void readLinesFromStream(std::istream& stream,std::vector<Line>& lines)
{
    /*
     * Read into a list as this is flexible in memory usage and will not
     * allocate huge chunks of un-required space.
     *
     * Even with huge files the space for list will be insignificant
     * compared to the size of the data.
     *
     * This then allows us to reserve the correct size of the vector
     * Thus avoiding huge memory chunks being prematurely allocated that
     * are not required. It also prevents the internal structure from
     * being copied every time the container is re-sized.
     */
    std::list<Line>     data;
    std::copy(  std::istream_iterator<Line>(stream),
                std::istream_iterator<Line>(),
                std::inserter(data,data.end())
             );

    /*
     * Reserve the correct size in the vector.
     * then copy out of the list into the vector
     */
    lines.reserve(data.size());
    std::copy(  data.begin(),
                data.end(),
                std::back_inserter(lines)
             );
}

void readLinesFromFile(std::string const& name,std::vector<Line>& lines)
{
    /*
     * Set up the file stream and override the default buffer used by the stream.
     * Make it big because we think the istream buffer is insufficient!!!!
     */
    std::ifstream       file;
    std::vector<char>   buffer(10000);
    file.rdbuf()->pubsetbuf(&buffer[0],buffer.size());

    file.open(name.c_str());
    readLinesFromStream(file,lines);
}


int main(int argc,char* argv[])
{
    std::vector<Line>   lines;
    readLinesFromFile(argv[1],lines);

    // Un-comment if your file is larger than 1100 lines.

    // I tested with a copy of the King James bible. 
    // std::cout << "Lines: " << lines.size() << "\n";
    // std::copy(lines.begin() + 1000,lines.begin() + 1100,std::ostream_iterator<Line>(std::cout));
}

回复收藏 0 原文

夜清冷一曲。 2024-07-17 19:58:21

不要使用 std::list。它需要比向量更多的内存。
Vector 执行所谓的“加倍”操作，即当空间不足时，它会分配当前拥有的内存的两倍。为了避免它，您可以使用 std::vector::reserve() 方法，如果我没有记错的话，您可以使用 std::vector::capacity()方法（注意capacity()>=size()）。

由于执行过程中行数未知，因此我认为没有简单的算法可以避免“加倍”问题。根据 slavy13.myopenid.com 的评论，解决方案是在完成阅读后将信息移动到另一个保留的向量（相关问题是如何缩小 std::vector 的大小？）。

回复收藏 0 原文

羁客 2024-07-17 19:58:21

尝试使用列表而不是向量。向量在内存中（几乎总是）是线性的。

诚然，您内部有字符串（几乎总是）修改时复制、引用计数这一事实应该可以减少问题，但它可能会有所帮助。

回复收藏 0 原文

你的呼吸 2024-07-17 19:58:21

我不知道这是否相关，因为我真的不知道你的文件是什么样的。

但您应该意识到，std::string 在存储非常短的字符串时可能会产生相当大的空间开销。如果您单独为非常短的字符串新建 char*，您还将看到所有分配块开销。

您要在该向量中放入多少个字符串，它们的平均长度是多少？

回复收藏 0 原文

り繁华旳梦境 2024-07-17 19:58:21

也许您应该详细说明为什么需要读取内存中的整个文件，我怀疑可能有一种方法可以完成您想要的操作，而无需立即将整个文件读取到内存中。如果您确实需要此功能，请查看内存映射文件，这可能比编写等效文件更有效。然后，您的内部数据结构可以使用文件中的偏移量。顺便说一句，请务必查看您是否需要处理字符编码。

回复收藏 0 原文

青巷忧颜 2024-07-17 19:58:21

我发现执行行的最佳方法是只读内存映射文件。不要在 for \n 中写入 \0，而是使用成对的 const char *，例如 std::pair 或const char* 对和计数。如果需要编辑行，一个好方法是创建一个可以存储指针对或 std::string 的对象修改后的线路。

至于使用 STL 向量或双端队列节省内存空间，一个好的技术是让它加倍，直到完成添加为止。然后将其调整为实际大小，这应该将未使用的内存释放回堆分配器。内存可能仍然分配给程序，尽管我不担心它。另外，不要采用默认大小，而是首先获取文件大小（以字节为单位），除以您对每行平均字符数的最佳猜测，并在开始时保留那么多空间。

回复收藏 0 原文