将文件上传到 Azure BLOB 存储 - Parallel.Foreach 比 Foreach 慢
我有以下代码,用于将文件夹形式本地存储上传到 blob 存储,包括在 blob 名称中的文件夹名称本身(该代码基于此处找到的一些方法 http://blog.smarx.com/posts/pivot-odata-and-windows-azure-visual-netflix-browsing ) :
public static void UploadBlobDir(CloudBlobContainer container, string dirPath)
{
string dirName = new Uri(dirPath).Segments.Last();
Parallel.ForEach(enumerateDirectoryRecursive(dirPath), file =>
{
string blobName = Path.Combine(dirName, Path.GetFullPath(file)).Substring(dirPath.Length - dirName.Length);
container.GetBlobReference(blobName).UploadFile(file);
});
}
和 :
private static IEnumerable<string> enumerateDirectoryRecursive(string root)
{
foreach (var file in Directory.GetFiles(root))
yield return file;
foreach (var subdir in Directory.GetDirectories(root))
foreach (var file in enumerateDirectoryRecursive(subdir))
yield return file;
}
此代码可以正常工作并按预期上传文件夹但需要花费大量时间才能完成 - 上传 25 个文件(每个文件 40KB~)需要20 秒。所以我厌倦了用常规循环替换并行循环,如下所示:
foreach (var file in enumerateDirectoryRecursive(i_DirPath))
{
string blobName = Path.Combine(dirName, Path.GetFullPath(file)).Substring(i_DirPath.Length - dirName.Length);
container.GetBlobReference(blobName).UploadFile(file);
}
现在上传立即完成(3 秒大约)。
还需要注意的是,我正在使用存储模拟器进行开发。
Parallel.Forech 显然应该更快。这种差异是否来自存储模拟器的限制(上线时,并行会更快),还是我可能做错了什么?
I have the following code for uploading a folder form local storage to blob storage, Including the folder name itself in the blob's name ( The code is based on some methods found here http://blog.smarx.com/posts/pivot-odata-and-windows-azure-visual-netflix-browsing ) :
public static void UploadBlobDir(CloudBlobContainer container, string dirPath)
{
string dirName = new Uri(dirPath).Segments.Last();
Parallel.ForEach(enumerateDirectoryRecursive(dirPath), file =>
{
string blobName = Path.Combine(dirName, Path.GetFullPath(file)).Substring(dirPath.Length - dirName.Length);
container.GetBlobReference(blobName).UploadFile(file);
});
}
and :
private static IEnumerable<string> enumerateDirectoryRecursive(string root)
{
foreach (var file in Directory.GetFiles(root))
yield return file;
foreach (var subdir in Directory.GetDirectories(root))
foreach (var file in enumerateDirectoryRecursive(subdir))
yield return file;
}
This code works and uploads the folder as intended but it takes an awful lot of time to complete - It takes 20 seconds to upload 25 files, 40KB~ each. So I tired to replace the parallel loop with a regular one like so :
foreach (var file in enumerateDirectoryRecursive(i_DirPath))
{
string blobName = Path.Combine(dirName, Path.GetFullPath(file)).Substring(i_DirPath.Length - dirName.Length);
container.GetBlobReference(blobName).UploadFile(file);
}
Now the uploading completes instantly (3 Seconds Approx.).
It's also important to note that I am working against the storage emulator for development.
The Parallel.Forech should obviously be faster. Does this difference comes form the storage emulator limitations (And when going live, Parallel would be faster) or is it something else I might be doing wrong ?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
根据我的经验,存储模拟器不会告诉您任何关于您应该从实际 Azure 存储中期望(或不期望)获得的性能。模拟器通常非常慢。
那么,只有当您的传输碰巧是延迟限制而不是 I/O 限制时,
Parallel.Foreach
才会更快。然后,请注意,Parallel.Foreach
只会使用您的CPU数量作为默认并行度。对于受延迟限制的进程,您通常应该拥有比这更多的线程,通常每个 CPU (YMMV) 4 到 8 个线程。In my experience, the storage emulator tells you strictly nothing about the performance you should expect (or not) from the actual Azure Storage. The emulator is typically extremely slow.
Then
Parallel.Foreach
will only be faster if your transfer happen to be latency-bound rather than I/O bound. Then, please note thatParallel.Foreach
will only use your number of CPU as the default parallelization degree. For latency-bound processes, you should typically have much more threads than that, typically 4 to 8 threads per CPU (YMMV).