C++程序在处理大数据时内存不足

发布于 2024-11-23 18:43:54 字数 4878 浏览 1 评论 0原文

我正在尝试解决我编写的 C++ 程序中的问题。我基本上内存不足了。该程序是一个缓存模拟器。有一个文件预先收集了内存地址，如下所示：

线程地址类型大小指令指针
0 0x7fff60000000 1 8 0x7f058c482af3

此类条目可能有 100-5000 亿个。首先，我尝试读取所有这些条目并将其存储在向量中。另外，在阅读时，我构建了一组这些地址（使用映射），并存储特定地址的序列号。序列号简单地表示地址条目在文件中的位置（一个地址可以多次看到）。对于大输入，程序在执行此操作时会失败，并在第 3000 万个条目左右出现 bad_alloc 错误。我想我的内存不足了。请告知我如何规避该问题。有没有其他方法来处理这种大数据。非常感谢！抱歉发了这么长的帖子。我想提供一些背景信息和我正在编写的实际代码。

下面是相关代码。 ParseTaceFile() 读取每一行并调用 StoreTokens()，获取地址和大小，并调用 AddAddress()，后者实际上将地址存储在向量和映射中。下面还给出了类声明。 AddAddress() 中的第一个 try 块实际上抛出 bad_alloc 异常。

void AddressList::ParseTraceFile(const char* filename) {
  std::ifstream in_file;
  std::cerr << "Reading Address Trace File..." << std::endl;
  in_file.exceptions(std::ifstream::failbit | std::ifstream::badbit);
  char *contents = NULL;
  try {
    in_file.open(filename, std::ifstream::in | std::ifstream::binary);
    in_file.seekg(0, std::ifstream::end);
    std::streampos length(in_file.tellg());
    if (length < 0) {
      std::cerr << "Can not read input file length" << std::endl;
      throw ExitException(1);
    }
    contents = (new char[length]);
    in_file.seekg(0, std::ifstream::beg);
    in_file.read(contents, length);
    in_file.close();
    uint64_t linecount = 0, i = 0, lastline = 0, startline = 0;
    while (i < static_cast<uint64_t>(length)) {
      if ((contents[i] == '\n') or (contents[i] == EOF)) {
        contents[i] = '\0';
        lastline = startline;
        startline = i + 1;
        ++linecount;
        if (linecount > 1) {
          StoreTokens((contents + lastline), &linecount);
        }
      }
      ++i;
    }
  } catch (std::bad_alloc& e) {
    delete [] contents;
    std::cerr << "error allocating memory while parsing" << std::endl;
    throw;
  } catch (std::ifstream::failure &exc1) {
    if (!in_file.eof()) {
      delete[] contents;
      std::cerr << "error in reading address trace file" << exc1.what()
          << std::endl;
      throw ExitException(1);
    }
  }
  std::cerr << "Done" << std::endl;
}
//=========================================================    
void AddressList::StoreTokens(char* line, uint64_t * const linecount) {
  uint64_t address, size;
  char *token = strtok(line, " \t");
  uint8_t tokencount = 0;
  while (NULL != token) {
    ++tokencount;
    switch (tokencount) {
    case 1:
      break;
    case 2:
      address = strtoul(token, NULL, 16);
      break;
    case 3:
      break;
    case 4:
      size = strtoul(token, NULL, 0);
      break;
    case 5:
      break;
    default:
      break;
    }
    token = strtok(NULL, " \t");
  }
  AddAddress(address, size);
}
//================================================================
void AddressList::AddAddress(const uint64_t& byteaddr, const uint64_t& size) {

  //allocate memory for the address vector
  try {
    if ((sequence_no_ % kReserveCount) == 0) address_list_.reserve(kReserveCount);

  } catch (std::bad_alloc& e) {
    std::cerr
        << "error allocating memory for address trace vector, address count"
        << sequence_no_ << std::endl;
    throw;
  }
  uint64_t offset = byteaddr & (CacheParam::Instance()->LineSize() - 1);
  //lineaddress = byteaddr >> CacheParam::Instance()->BitsForLine();
  // this try block is for allocating memory for the address set and the queue it holds
  try {
    // splitter
    uint64_t templinesize = 0;
    do {
      Address temp_addr(byteaddr + templinesize);
      address_list_.push_back(temp_addr);
      address_set_[temp_addr.LineAddress()].push(sequence_no_++);
      templinesize = templinesize + CacheParam::Instance()->LineSize();
    } while (size + offset > templinesize);
  } catch (std::bad_alloc& e) {
    address_list_.pop_back();
    std::cerr
    << "error allocating memory for address trace set, address count"
    << sequence_no_ << std::endl;
    throw;
  }
 }

//======================================================
typedef std::queue<uint64_t> TimeStampQueue;
typedef std::map<uint64_t, TimeStampQueue> AddressSet;
class AddressList {
public:
  AddressList(const char* tracefilename);
  bool Simulate(uint64_t *hit_count, uint64_t* miss_count);
  ~AddressList();

private:
  void AddAddress(const uint64_t& byteaddr, const uint64_t& size);
  void ParseTraceFile(const char* filename);
  void StoreTokens(char* line, uint64_t * const linecount);

  std::vector<Address> address_list_;
  AddressSet address_set_;
  uint64_t sequence_no_;
  CacheMemory cache_;

  AddressList (const AddressList&);
  AddressList& operator=(const AddressList&);
};

输出如下：

正在读取缓存配置文件...

读取缓存参数...

正在读取地址跟踪文件...

为地址跟踪集分配内存时出错，地址计数 30000000

解析时分配内存时出错

原文

I am trying to solve an issue in a C++ program I wrote. I am basically running out of memory. The program is a cache simulator. There is a file which has memory addresses collected beforehand, like this:

Thread Address Type Size Instruction Pointer
0 0x7fff60000000 1 8 0x7f058c482af3

There can be 100-500 billion such entries. First, I am trying to read all those entries and store it in a vector. Also while reading, I build up a set of these addresses (using map), and store the sequence numbers of a particular address. Sequence number simply means the position of the address-entry in the file (one address can be seen multiple times). For large inputs the program fails while doing this, with a bad_alloc error at around the 30 millionth entry. I guess I am running out of memory. Please advise on how can I circumvent the problem. Is there an alternative way to handle this kind of large data. Thank you very much! Sorry for the long post. I wanted to give some context and the actual code which I am writing.

Below is the relevant code. The ParseTaceFile() reads each line and calls the
StoreTokens(), which gets the address and size, and calls AddAddress() which actually stores the address in a vector and a map. The class declaration is also given below. The first try block in AddAddress() actually throws the bad_alloc exception.

void AddressList::ParseTraceFile(const char* filename) {
  std::ifstream in_file;
  std::cerr << "Reading Address Trace File..." << std::endl;
  in_file.exceptions(std::ifstream::failbit | std::ifstream::badbit);
  char *contents = NULL;
  try {
    in_file.open(filename, std::ifstream::in | std::ifstream::binary);
    in_file.seekg(0, std::ifstream::end);
    std::streampos length(in_file.tellg());
    if (length < 0) {
      std::cerr << "Can not read input file length" << std::endl;
      throw ExitException(1);
    }
    contents = (new char[length]);
    in_file.seekg(0, std::ifstream::beg);
    in_file.read(contents, length);
    in_file.close();
    uint64_t linecount = 0, i = 0, lastline = 0, startline = 0;
    while (i < static_cast<uint64_t>(length)) {
      if ((contents[i] == '\n') or (contents[i] == EOF)) {
        contents[i] = '\0';
        lastline = startline;
        startline = i + 1;
        ++linecount;
        if (linecount > 1) {
          StoreTokens((contents + lastline), &linecount);
        }
      }
      ++i;
    }
  } catch (std::bad_alloc& e) {
    delete [] contents;
    std::cerr << "error allocating memory while parsing" << std::endl;
    throw;
  } catch (std::ifstream::failure &exc1) {
    if (!in_file.eof()) {
      delete[] contents;
      std::cerr << "error in reading address trace file" << exc1.what()
          << std::endl;
      throw ExitException(1);
    }
  }
  std::cerr << "Done" << std::endl;
}
//=========================================================    
void AddressList::StoreTokens(char* line, uint64_t * const linecount) {
  uint64_t address, size;
  char *token = strtok(line, " \t");
  uint8_t tokencount = 0;
  while (NULL != token) {
    ++tokencount;
    switch (tokencount) {
    case 1:
      break;
    case 2:
      address = strtoul(token, NULL, 16);
      break;
    case 3:
      break;
    case 4:
      size = strtoul(token, NULL, 0);
      break;
    case 5:
      break;
    default:
      break;
    }
    token = strtok(NULL, " \t");
  }
  AddAddress(address, size);
}
//================================================================
void AddressList::AddAddress(const uint64_t& byteaddr, const uint64_t& size) {

  //allocate memory for the address vector
  try {
    if ((sequence_no_ % kReserveCount) == 0) address_list_.reserve(kReserveCount);

  } catch (std::bad_alloc& e) {
    std::cerr
        << "error allocating memory for address trace vector, address count"
        << sequence_no_ << std::endl;
    throw;
  }
  uint64_t offset = byteaddr & (CacheParam::Instance()->LineSize() - 1);
  //lineaddress = byteaddr >> CacheParam::Instance()->BitsForLine();
  // this try block is for allocating memory for the address set and the queue it holds
  try {
    // splitter
    uint64_t templinesize = 0;
    do {
      Address temp_addr(byteaddr + templinesize);
      address_list_.push_back(temp_addr);
      address_set_[temp_addr.LineAddress()].push(sequence_no_++);
      templinesize = templinesize + CacheParam::Instance()->LineSize();
    } while (size + offset > templinesize);
  } catch (std::bad_alloc& e) {
    address_list_.pop_back();
    std::cerr
    << "error allocating memory for address trace set, address count"
    << sequence_no_ << std::endl;
    throw;
  }
 }

//======================================================
typedef std::queue<uint64_t> TimeStampQueue;
typedef std::map<uint64_t, TimeStampQueue> AddressSet;
class AddressList {
public:
  AddressList(const char* tracefilename);
  bool Simulate(uint64_t *hit_count, uint64_t* miss_count);
  ~AddressList();

private:
  void AddAddress(const uint64_t& byteaddr, const uint64_t& size);
  void ParseTraceFile(const char* filename);
  void StoreTokens(char* line, uint64_t * const linecount);

  std::vector<Address> address_list_;
  AddressSet address_set_;
  uint64_t sequence_no_;
  CacheMemory cache_;

  AddressList (const AddressList&);
  AddressList& operator=(const AddressList&);
};

The output is like this:

Reading Cache Configuration File...

Cache parameters read...

Reading Address Trace File...

error allocating memory for address trace set, address count 30000000

error allocating memory while parsing

分享到QQ

分享到微博