C/C++系统可移植方式更改最大打开文件数

发布于 2024-11-08 14:31:49 字数 2183 浏览 2 评论 0原文

我有一个 C++ 程序，可以转置一个非常大的矩阵。矩阵太大，无法保存在内存中，因此我将每一列写入一个单独的临时文件，然后在处理整个矩阵后将临时文件连接起来。但是，我现在发现我遇到了打开太多临时文件的问题（即操作系统不允许我打开足够的临时文件）。是否有一种系统可移植方法来检查（并希望更改）允许打开的文件的最大数量？

我意识到我可以关闭每个临时文件并仅在需要时重新打开，但我担心这样做会对性能产生影响。

我的代码工作如下（伪代码 - 不保证工作）：

int Ncol=5000; // For example - could be much bigger.
int Nrow=50000; // For example - in reality much bigger.

// Stage 1 - create temp files
vector<ofstream *> tmp_files(Ncol);  // Vector of temp file pointers.
vector<string> tmp_filenames(Ncol);  // Vector of temp file names.
for (unsigned int ui=0; ui<Ncol; ui++)
{
    string filename(tmpnam(NULL));  // Get temp filename.
    ofstream *tmp_file = new ofstream(filename.c_str());
    if (!tmp_file->good())
         error("Could not open temp file.\n"); // Call error function
    (*tmp_file) << "Column" << ui;
    tmp_files[ui] = tmp_file;
    tmp_filenames[ui] = filename;
 }

 // Stage 2 - read input file and write each column to temp file
 ifstream input_file(input_filename.c_str());
 for (unsigned int s=0; s<Nrow; s++)
 {
       int input_num;
       ofstream *tmp_file;
       for (unsigned int ui=0; ui<Ncol; ui++)
       {
           input_file >> input_num;
           tmp_file = tmp_files[ui];          // Get temp file pointer
           (*tmp_file) << "\t" << input_num;  // Write entry to temp file.
       }
 }
 input_file.close();

 // Stage 3 - concatenate temp files into output file and clean up.
 ofstream output_file("out.txt");
 for (unsigned int ui=0; ui<Ncol; ui++)
 {
      string tmp_line;
      // Close temp file
      ofstream *tmp_file = tmp_files[ui];
      (*tmp_file) << endl;
      tmp_file->close();

      // Read from temp file and write to output file.
      ifstream read_file(tmp_filenames[ui].c_str());
      if (!read_file.good())
            error("Could not open tmp file for reading."); // Call error function
      getline(read_file, tmp_line);
      output_file << tmp_line << endl;
      read_file.close();

      // Delete temp file.
      remove(tmp_filenames[ui].c_str());
 }
 output_file.close();

非常感谢！

亚当

原文

I have a C++ program that transposes a very large matrix. The matrix is too large to hold in memory, so I was writing each column to a separate temporary file, and then concatenating the temporary files once the whole matrix has been processed. However, I am now finding that I am running up against the problem of having too many open temporary files (i.e. the OS doesn't allow me to open enough temporary files). Is there a system portable method for checking (and hopefully changing) the maximum number of allowed open files?

I realise I could close each temp file and reopen only when needed, but am worried about the performance impact of doing this.

My code works as follows (pseudocode - not guaranteed to work):

int Ncol=5000; // For example - could be much bigger.
int Nrow=50000; // For example - in reality much bigger.

// Stage 1 - create temp files
vector<ofstream *> tmp_files(Ncol);  // Vector of temp file pointers.
vector<string> tmp_filenames(Ncol);  // Vector of temp file names.
for (unsigned int ui=0; ui<Ncol; ui++)
{
    string filename(tmpnam(NULL));  // Get temp filename.
    ofstream *tmp_file = new ofstream(filename.c_str());
    if (!tmp_file->good())
         error("Could not open temp file.\n"); // Call error function
    (*tmp_file) << "Column" << ui;
    tmp_files[ui] = tmp_file;
    tmp_filenames[ui] = filename;
 }

 // Stage 2 - read input file and write each column to temp file
 ifstream input_file(input_filename.c_str());
 for (unsigned int s=0; s<Nrow; s++)
 {
       int input_num;
       ofstream *tmp_file;
       for (unsigned int ui=0; ui<Ncol; ui++)
       {
           input_file >> input_num;
           tmp_file = tmp_files[ui];          // Get temp file pointer
           (*tmp_file) << "\t" << input_num;  // Write entry to temp file.
       }
 }
 input_file.close();

 // Stage 3 - concatenate temp files into output file and clean up.
 ofstream output_file("out.txt");
 for (unsigned int ui=0; ui<Ncol; ui++)
 {
      string tmp_line;
      // Close temp file
      ofstream *tmp_file = tmp_files[ui];
      (*tmp_file) << endl;
      tmp_file->close();

      // Read from temp file and write to output file.
      ifstream read_file(tmp_filenames[ui].c_str());
      if (!read_file.good())
            error("Could not open tmp file for reading."); // Call error function
      getline(read_file, tmp_line);
      output_file << tmp_line << endl;
      read_file.close();

      // Delete temp file.
      remove(tmp_filenames[ui].c_str());
 }
 output_file.close();

Many thanks in advance!

Adam

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

不必在意 2024-11-15 14:31:49

至少有两个限制：

操作系统可能施加限制；在 Unix（sh、bash 和类似的 shell）中，使用 ulimit 来更改限制，在系统管理员允许的范围内，
C 库实现也可能有限制；您可能需要重新编译库才能更改该情况。

更好的解决方案是避免打开太多文件。在我自己的一个程序中，我围绕文件抽象编写了一个包装器（这是在 Python 中，但原理在 C 中是相同的），它跟踪每个文件中的当前文件位置，并根据需要打开/关闭文件，保留当前打开的文件池。

回复收藏 0 原文

时光暖心i 2024-11-15 14:31:49

没有一种可移植的方法来更改打开文件的最大数量。此类限制往往是由操作系统施加的，因此是特定于操作系统的。

最好的办法是减少每次打开的文件数量。

回复收藏 0 原文

萌无敌 2024-11-15 14:31:49

您可以将输入文件规范化为临时文件，以便每个条目占用相同数量的字符。您甚至可以考虑将该临时文件保存为二进制文件（每个数字使用 4/8 个字节，而不是每个十进制数字 1 个字节）。这样您就可以根据矩阵中的坐标计算文件中每个条目的位置。然后，您可以通过执行 std::istream::seekg 来访问特定条目并且您不必担心打开文件数量的限制。

回复收藏 0 原文

口干舌燥 2024-11-15 14:31:49

只制作 1 个大文件而不是许多小临时文件怎么样？ Seek 是一种廉价的操作。无论如何，您的列都应该具有相同的大小。您应该能够将文件指针定位在需要访问该列的位置。

 // something like...

 column_position = sizeof(double)*Nrows*column ;
 is.seekg(column_position) ;
 double column[Nrows] ;
 for( i = 0 ; i < Nrows ; i++ )
    is >> column[i] ;

How about just making 1 big file instead of many small temp files? Seek is a cheap operation. And your columns should all be the same size anyway. You should be able to position your file pointer right where you need it to access the column.

 // something like...

 column_position = sizeof(double)*Nrows*column ;
 is.seekg(column_position) ;
 double column[Nrows] ;
 for( i = 0 ; i < Nrows ; i++ )
    is >> column[i] ;

回复收藏 0 原文

鹤仙姿 2024-11-15 14:31:49

“矩阵太大，无法保存在内存中”。不过，该矩阵很可能适合您的地址空间。（如果矩阵无法容纳 2^64 字节，您将需要一个非常强大的文件系统来保存所有这些临时文件。）因此，不必担心临时文件。让操作系统处理交换到磁盘的工作方式。您只需要确保以交换友好的方式访问内存即可。实际上，这意味着您需要有一些参考位置。但是对于 16 GB 的 RAM，您可以映射大约 400 万页的 RAM。如果您的列数明显小于该值，那么应该没有问题。

（不要为此使用 32 位系统；这样做不值得）

回复收藏 0 原文

~没有更多了~