提高 Azure Blob 存储查询速度

发布于 2024-11-06 11:50:07 字数 851 浏览 0 评论 0原文

目前,我们的 Blob 存储在同一个 Azure 容器下包含数千个文件。我们的文件命名约定是这样的:

StorageName\Team\SubTeam\FileName

我正在编写一个工具来显示每个特定子团队的文件。该代码获取容器的 Blob 列表,然后尝试将每个 Blob 与正确的 Team\Subteam 匹配(请参阅下面的示例代码)。

这可以工作,但速度非常慢(因为我需要检查所有文件以查看它们是否与特定的子团队匹配)。有什么办法可以提高查询速度吗?我可以想到优化,例如“找到与您正在寻找的团队匹配的第一个文件,然后在找到不同的团队以尽早退出时进行跟踪”,但这会假设 BlobList 已排序并且无法修复最坏的情况。

不幸的是,目前无法将文件拆分到不同的容器下。

这是示例代码:

IEnumerable<IListBlobItem> blobs = blobContainer.ListBlobs(
    new BlobRequestOptions() 
    {
        UseFlatBlobListing = true, 
        BlobListingDetails = BlobListingDetails.Metadata 
    }).OfType<CloudBlob>();

foreach (var blob in blobs) {
var cloudy = blob as CloudBlob;

string blobTeamId = cloudy.Uri.Segments[2].Trim('/');
if (blobTeamId != teamId)
        continue;

//Do something interesting with the file

We currently have a blob storage with thousands of files under the same Azure container. Our file naming convention is something like this:

StorageName\Team\SubTeam\FileName

I'm writing a tool that displays the files for each particular subteam. The code gets the list of blobs for the Container and then for each of those it tries to match to the correct Team\Subteam (see below for sample code).

This works but is extremely slow (because I need to go through all the files to see if they match a particular subteam). Is there some way to improve the speed of the query? I can think of optimizations such as "Find the first file that matches the team you are looking for and then keep track when you find a different team to quit the for early" but that would assume that the BlobList is sorted and wouldn't fix the worst case scenario.

Unfortunately splitting the files under different containers is not an option at this time.

Here is sample code:

IEnumerable<IListBlobItem> blobs = blobContainer.ListBlobs(
    new BlobRequestOptions() 
    {
        UseFlatBlobListing = true, 
        BlobListingDetails = BlobListingDetails.Metadata 
    }).OfType<CloudBlob>();

foreach (var blob in blobs) {
var cloudy = blob as CloudBlob;

string blobTeamId = cloudy.Uri.Segments[2].Trim('/');
if (blobTeamId != teamId)
        continue;

//Do something interesting with the file

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

演出会有结束 2024-11-13 11:50:07

第一个解决方案
使用 REST 接口,您可以传入

http://somwhere.com/mycontainername/?restype=container&comp=list&delimiter=/&prefix=\Team\SubTeam

,这将返回一个 xml 文档,其中仅包含子团队“文件夹”中的文件(我知道它不是文件夹,但它看起来像工具中的文件夹)

您可能需要生成共享访问权限签名才能访问它,您必须在 URL 末尾标记此签名。

查看此处

其中显示您可以按 blobname 前缀进行筛选。

第二个解决方案
这可能更接近您想要的。如果您可以使用 azure sdk 1.3 中更新的新存储客户端,那么您现在可以使用

IEnumerable blobList = client.ListBlobsWithPrefix("Team/SubTeam");

其中 Client 是 CloudBlobClient 的实例。

编辑 - 2013 年 11 月 18 日
看起来resttype不再被支持作为参数,它应该是restype。这一切似乎在周末悄然发生。我已经更改了上面的 url 示例。

1st Solution
With the REST interface you can pass in

http://somwhere.com/mycontainername/?restype=container&comp=list&delimiter=/&prefix=\Team\SubTeam

and this will return an xml doc with only the files in the sub team "Folder" (I know its not a folder but it looks like one in the tools)

You might need to generate a shared access signature to be able to access it you have to tag this on the end of the URL.

check out here

Where it shows that you can filter by blobname prefix.

2nd Solution
This is probably closer to what you want. If you can use the new storage client that was updated in the azure sdk 1.3 then you can now use

IEnumerable blobList = client.ListBlobsWithPrefix("Team/SubTeam");

Where Client is an instance of CloudBlobClient.

EDIT - 18 Nov 2013
it looks like resttype is no longer supported as a parameter and it should be restype. This seems to have happened quietly over the weekend. I have changed the url example above.

甜警司 2024-11-13 11:50:07

只是更新...

您可以使用 GetDirectoryRefence 获取 blob 列表,然后列出 blob...

var subDirectory = blobContainer.GetDirectoryReference(String.Format("{0}/", folder));
return subDirectory.ListBlobs(false, BlobListingDetails.Metadata);

Just an update...

You can use get a list of blobs by using GetDirectoryRefence and then list blobs...

var subDirectory = blobContainer.GetDirectoryReference(String.Format("{0}/", folder));
return subDirectory.ListBlobs(false, BlobListingDetails.Metadata);
ま柒月 2024-11-13 11:50:07

您真的需要 BlobListingDetails.Metadata 吗?这会导致下载大量额外信息。我想你所需要的只是名字

Do you really need the BlobListingDetails.Metadata ? that is causing a lot of extra information to be downloaded. I think all you need is the name

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文