std ::执行中断文件写作
该程序似乎未等待文件系统与std :: excution
集成时应用内部更改。
我最近尝试了PPL,TBB,OpenMP和std :: Exection
找到用于在我的计算机上运行的特定工作的最快的并行lib。这项工作是递归将某些文件转换为另一个形式,这基本上是:
#include <iostream>
#include <filesystem>
#include <fstream>
#include <vector>
#include <iterator>
using namespace std;
namespace fs = std::filesystem;
const auto CurrentPath = fs::current_path();
fs::path GetOutputPath(const fs::directory_entry& Entry)
{
return CurrentPath / "Converted" / fs::relative(Entry, CurrentPath);
}
// Convert and save a file entry.
void ConvertFile(const fs::directory_entry& Entry)
{
ifstream IStream(Entry.path(), ios::in | ios::binary);
noskipws(IStream);
ofstream OStream(GetOutputPath(Entry), ios::out | ios::trunc | ios::binary);
// Read the data.
// Intentional `uint8_t` for some special needs.
vector<uint8_t> Data;
Data.reserve(Entry.file_size());
Data.assign(istream_iterator<uint8_t>(IStream), {});
// Some changes to `Data`.
// Left blank to simplify the problem.
// Write to the output file.
OStream.write(reinterpret_cast<char *>(&Data[0]), Data.size());
}
// Convert and save a directory entry.
void ConvertDirectory(const fs::directory_entry& Entry);
// Convert and save an entry.
void Convert(const fs::directory_entry& Entry)
{
// Recursively loop directories
if (Entry.is_directory())
{
ConvertDirectory(Entry);
}
else
{
ConvertFile(Entry);
}
}
void ConvertDirectory(const fs::directory_entry& Entry)
{
// Not using `recursive_directory_iterator` since its order is not guaranteed.
// I think manual recursion can always minimize the number of calls to create a directory.
vector<fs::directory_entry> SubEntries(fs::directory_iterator(Entry), {});
fs::create_directory(GetOutputPath(Entry));
// ** Parallel ** part.
for_each(SubEntries.cbegin(), SubEntries.cend(), Convert);
}
int main(int argc, char *argv[])
{
ConvertDirectory(fs::directory_entry(CurrentPath / "Test"));
}
使用PPL(ppl.h
),我将** Parallel ** Part ** Part ** part
更改为:
concurrency::parallel_for_each(SubEntries.cbegin(), SubEntries.cend(), Convert);
使用 : tbb(oneapi/tbb.h
):
tbb::parallel_for_each(SubEntries.cbegin(), SubEntries.cend(), Convert);
使用std :: excution
:
for_each(execution::par, SubEntries.cbegin(), SubEntries.cend(), Convert);
结果:
测试1 :300 dir中的700个文件,每个文件1.5MB,每个单元格10个测试中的AVG值
测试2 :5k dir中的50k文件,每个文件0.15MB,每个单元测试20次测试的AVG值
1 | 测试2 | |
---|---|---|
(原始) | 2.8S | 130S 130S |
PPL | 0.59S | 37S 37S |
TBB | 0.58S | 36.8S |
STD ::::执行 | 0.55s | 32.6s的 |
改进非常明显。
但是问题是,当我每次测试检查输出文件时,我发现std :: excution
版本在测试2中的输出文件可能低至45k的输出文件。如果我在每个<之后打印一条消息代码> write(),问题仍然存在,而消息的数量始终为50k,因此处理了每个文件,并且问题似乎更有可能是write> write
call not call not Afterle程序退出时应用于文件系统。
目录的数量始终是正确的,并且PPL和TBB版本没有这样的问题。
我正在使用Visual Studio 2022。 我写错了吗?我该怎么做才能防止std :: Decution
发生这种情况?
The program seems to exit without waiting for the file system to apply the inner changes when integrating with std::execution
.
I recently tried PPL, TBB, OpenMP and std::exection
to find the fastest parallel lib for a particular work running on my machine. The work is to recursively convert some files into another form, which is basically:
#include <iostream>
#include <filesystem>
#include <fstream>
#include <vector>
#include <iterator>
using namespace std;
namespace fs = std::filesystem;
const auto CurrentPath = fs::current_path();
fs::path GetOutputPath(const fs::directory_entry& Entry)
{
return CurrentPath / "Converted" / fs::relative(Entry, CurrentPath);
}
// Convert and save a file entry.
void ConvertFile(const fs::directory_entry& Entry)
{
ifstream IStream(Entry.path(), ios::in | ios::binary);
noskipws(IStream);
ofstream OStream(GetOutputPath(Entry), ios::out | ios::trunc | ios::binary);
// Read the data.
// Intentional `uint8_t` for some special needs.
vector<uint8_t> Data;
Data.reserve(Entry.file_size());
Data.assign(istream_iterator<uint8_t>(IStream), {});
// Some changes to `Data`.
// Left blank to simplify the problem.
// Write to the output file.
OStream.write(reinterpret_cast<char *>(&Data[0]), Data.size());
}
// Convert and save a directory entry.
void ConvertDirectory(const fs::directory_entry& Entry);
// Convert and save an entry.
void Convert(const fs::directory_entry& Entry)
{
// Recursively loop directories
if (Entry.is_directory())
{
ConvertDirectory(Entry);
}
else
{
ConvertFile(Entry);
}
}
void ConvertDirectory(const fs::directory_entry& Entry)
{
// Not using `recursive_directory_iterator` since its order is not guaranteed.
// I think manual recursion can always minimize the number of calls to create a directory.
vector<fs::directory_entry> SubEntries(fs::directory_iterator(Entry), {});
fs::create_directory(GetOutputPath(Entry));
// ** Parallel ** part.
for_each(SubEntries.cbegin(), SubEntries.cend(), Convert);
}
int main(int argc, char *argv[])
{
ConvertDirectory(fs::directory_entry(CurrentPath / "Test"));
}
With PPL (ppl.h
), I change the ** Parallel ** part
to:
concurrency::parallel_for_each(SubEntries.cbegin(), SubEntries.cend(), Convert);
With TBB (oneapi/tbb.h
):
tbb::parallel_for_each(SubEntries.cbegin(), SubEntries.cend(), Convert);
With std::execution
:
for_each(execution::par, SubEntries.cbegin(), SubEntries.cend(), Convert);
Results:
Test 1: 700 files in 300 dirs, 1.5MB per file, avg values from 10 tests per cell
Test 2: 50k files in 5k dirs, 0.15MB per file, avg values from 20 tests per cell
Test 1 | Test 2 | |
---|---|---|
(Original) | 2.8s | 130s |
PPL | 0.59s | 37s |
TBB | 0.58s | 36.8s |
std::execution | 0.55s | 32.6s |
The improvements are quite obvious.
But the problem is, when I check the output files per test, I found the std::execution
version may have as low as 45k output files in a test 2. If I print a message after each write()
, the problem still exists, while the number of messages is always 50k, so every file is processed and the problem seems more likely to be the write
calls not being fully applied to the file system when the program exits.
The number of directories is always correct, and the PPL and TBB versions don't have a such problem.
I'm using Visual Studio 2022.
Did I wrote anything wrong? What can I do to prevent this from happening with std::execution
?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论