我正在构建一个应用程序,需要允许用户将大图像(最多约 100 MB)上传到 Windows Azure Blob 存储服务。阅读了 Rob Gillen 的精彩文章 关于 Windows Azure 的文件上传优化,我借用了他的方法来并行上传文件块,使用 CloudBlockBlob.PutBlock() 方法rel="nofollow">Parallel.For 循环(代码可在 此处)。
我遇到的问题是,每当我尝试上传文件时,都会收到“InvalidMd5< /a>”来自存储客户端的异常。由于怀疑问题可能出在开发存储中,我还尝试针对我的实时 Azure 存储帐户运行代码,但遇到了相同的错误。使用 Fiddler 查看流量,我发现“Content-MD5”标头设置为有效的 MD5 哈希值。错误描述显示“请求中指定的 MD5 值无效。MD5 值必须是 128 位且采用 Base64 编码。”,但据我所知,我看到的值是在 Fiddler 中发送是有效的(例如 a91c588092cedbdb1b82c2d3786fd509)。
这是我用于计算哈希值的代码(由 Rob Gillen 提供):
public static string GetMD5HashFromStream(byte[] data)
{
MD5 md5 = new MD5CryptoServiceProvider();
byte[] retVal = md5.ComputeHash(data);
StringBuilder sb = new StringBuilder();
for (int i = 0; i < retVal.Length; i++)
{
sb.Append(retVal[i].ToString("x2"));
}
return sb.ToString();
}
这是对 PutBlock() 的实际调用:
blob.PutBlock(transferDetails[j].BlockId, new MemoryStream(buff), blockHash, options);
我也尝试像这样传递哈希值:
Convert.ToBase64String(Encoding.UTF8.GetBytes(blockHash))
但结果是相同的 - “InvalidMd5< /em>”错误:(
使用base64编码(例如YTkxYzU4ODA5MmNlZGJkYjFiODJjMmQzNzg2ZmQ1MDk=)传递给PutBlock()的MD5哈希值和没有它的(例如a91c588092cedbdb1b82c2d3786fd509)似乎不会 有所不同。
Rob 的代码显然 为他工作,我真的不知道在我的情况下可能导致问题的原因是我对 Rob 的代码所做的唯一更改是更改 ParallelUpload() 扩展方法以采用 Stream 而不是文件名并动态地进行。根据上传文件的大小确定块大小
,如果有人知道如何解决这个问题,请告诉我!我已经为此奋斗了两天了!
I am building an application that needs to allow users to upload large images (up to about 100 MB) to the Windows Azure Blob Storage service. Having read Rob Gillen's excellent article on file upload optimization for Windows Azure, I borrowed his approach for doing parallel upload of file chunks, using the CloudBlockBlob.PutBlock() method within a Parallel.For loop (code is available here).
The problem I have is that whenever I try to upload a file I get an "InvalidMd5" exception from the storage client. Suspecting that the problem may be in the development storage, I also tried running the code against my live Azure storage account, but I got the same error. Looking at the traffic with Fiddler I see that the "Content-MD5" header is set to a valid MD5 hash. The description of the error says that "The MD5 value specified in the request is invalid. The MD5 value must be 128 bits and Base64-encoded.", but to the best of my knowledge the value I see being sent in Fiddler is valid (e.g. a91c588092cedbdb1b82c2d3786fd509).
Here is the code I use for calculating the hash (courtesy of Rob Gillen):
public static string GetMD5HashFromStream(byte[] data)
{
MD5 md5 = new MD5CryptoServiceProvider();
byte[] retVal = md5.ComputeHash(data);
StringBuilder sb = new StringBuilder();
for (int i = 0; i < retVal.Length; i++)
{
sb.Append(retVal[i].ToString("x2"));
}
return sb.ToString();
}
And this is the actual call to PutBlock():
blob.PutBlock(transferDetails[j].BlockId, new MemoryStream(buff), blockHash, options);
I also tried passing the hash like so:
Convert.ToBase64String(Encoding.UTF8.GetBytes(blockHash))
but the result was the same - "InvalidMd5" error :(
The MD5 hash being passed to PutBlock() with base64 encoding (e.g. YTkxYzU4ODA5MmNlZGJkYjFiODJjMmQzNzg2ZmQ1MDk=) and without it (e.g. a91c588092cedbdb1b82c2d3786fd509) doesn't seem to make a difference.
Rob's code obviously worked for him and I really have no idea what may be causing the problem in my case. The only change I've made to Rob's code is to alter the ParallelUpload() extension method to take a Stream instead of a file name and to dynamically determine the block size depending on the size of the file being uploaded.
Please, if anyone has an idea how to solve this problem, let me know! I will be really grateful! I already lost two days struggling with this.
发布评论
评论(2)
Rob,感谢您提供帮助并指出 MD5 哈希值的差异。你的回答让我朝着正确的方向思考。我又花了一整天的时间来研究这个问题,但幸运的是(并且感谢你的评论:))我终于成功解决了这个问题。事实证明,我的案例实际上存在两个问题:
1)MD5 哈希:我注意到您在答案中粘贴的哈希比我得到的要短,但我花了一段时间才看到你的正好短两倍。经过一番实验后,我发现测试应用程序中的 GetMD5HashFromStream() 方法正在将 MD5CryptoServiceProvider 生成的 16 字节 哈希值转换为32 个字符 字符串。正是这个 32 个字符的字符串导致了问题,因为它被转换为 Base64 并传递给 PutBlock() 方法,因此长度增加了一倍,因此哈希值无效blob 存储服务正在抱怨。这是我最终得到的代码:
Original:
和对 PutBlock() 的调用:
Final:
Rob,我真的很好奇你的代码在你的案例中是如何工作的,为什么它在我的中没有 - 这是我的机器上的特定设置,或者可能是 Azure 工具的不同版本(我使用的是 v1.2)...如果您有任何想法,请告诉我。
2) 开发存储中的一个错误:通过大量的网络梳理,我找到了此页面提到了开发存储中一个不起眼但显然已知的错误:
以下是我想出的解决方法:
您需要添加对 Microsoft.ServiceHosting.Tools.dll 的引用,该引用位于“C:\Program Files\Windows我的计算机上的 Azure SDK\v1.2\bin”。然后,我在处理文件块的 Parallel.For 循环之前使用此方法,如下所示:
我希望这可以避免我经历的所有麻烦。罗布,再次感谢您的帮助:)
Rob, thank you for offering to help and pointing out the difference in the MD5 hashes. Your answer got me thinking in the right direction. I spent another whole day digging into this but luckily (and thanks to your remark :)) I finally managed to resolve the problem. It turned out there were actually two issues in my case:
1) The MD5 hash: I noticed the hash you pasted in your answer is shorter than the one I was getting but it took me a while to see yours was exactly twice shorter. After some experimentation I found out that the GetMD5HashFromStream() method from your test application is converting the 16-byte hash generated by the MD5CryptoServiceProvider to a 32-character string. And it was this 32-character string that was causing the problem because it was converted to Base64 and passed to the PutBlock() method, hence the twice longer and thus invalid hash that the blob storage service was complaining about. Here is the code I ended up with:
Original:
and the call to PutBlock():
Final:
Rob, I'm really curious how your code worked in your case and why it didn't in mine - is it something specific to the setup on my machine, or perhaps a differing version of the Azure tools (I'm using v1.2)... Please let me know if you have any idea.
2) A bug in the development storage: lots of combing through the web led me to this page that mentions an obscure but apparently known bug in the development storage:
Here is what I came up with to work around it:
You will need to add a reference to Microsoft.ServiceHosting.Tools.dll, which was located in "C:\Program Files\Windows Azure SDK\v1.2\bin" on my machine. Then, I use this method before the Parallel.For loop that processes the file chunks as follows:
I hope this will save someone all the hassles I went through. Rob, thank you once again for helping out :)
tishon,
看到这篇文章后,我回去重新测试了我的代码,我认为传递的数据存在问题(可能是您传递给函数的数据?)。
我立即想到的一件事是您提供的 md5 哈希值...在我测试过的每种情况下,我的 md5 哈希值都以两个等号结尾,如下所示(从 fiddler 捕获):
Content-MD5: D1Mxthoqhlwm9cC0729mWA==
I我不是加密专家,但我通过使用块 Blob 的块 ID 知道,如果在将 Blob ID 转换为 Base64 编码值之前,您的 Blob ID 中包含无效/不安全的字符,您将获得无效的数据和块 ID Azure 无法解释。
tishon,
After seeing this post, I went back and re-tested my code, and I'm thinking that there is a problem with the data being passed (possibly what you are passing into the function?).
One thing that jumped out at me immediately was the md5 hash you provided... in every case I've tested, my md5 hashes end with two equals signs like the following (captured from fiddler):
Content-MD5: D1Mxthoqhlwm9cC0729mWA==
I'm not a crypto expert, but I know from working with the block IDs for block blobs, that if you have invalid/unsafe characters in your blob ID prior to converting it a base64 encoded value you'll get invalid data and block ids that Azure can't interpret.