如何跟踪无序的单个对象，然后 Joining() 连续的对象？

发布于 2024-12-09 07:31:43 字数 1963 浏览 0 评论 0原文

首先我要说的是，这比盲目地将 byte[] 连接在一起要困难一些。

我的总体目标是优化当前将许多 512 字节页面上传到 Web 服务器 (Azure Page Blob) 的应用程序，并将其减少到 4 Megs 或更少的单个大型上传。请参阅此问题底部的链接，了解有关原因的更多背景信息。

原因的简短回答：从长远来看，此优化将提高速度（更少的 IO）并通过使用 Azure 稀疏文件节省资金

现在了解详细信息：

该类将需要

接受数据并存储它（数据定义为对齐开始、对齐停止以及随附的有效负载。
在 N 条数据到达或发生事件之后，ProcessData() 这意味着是时候组装了。数据按照边界（blob1 的终止值必须与 blob2 的起始值对齐）
连续数据实际上可能是无序到达的。
非连续数据定义为调用应用程序在 processData() 发生之前未发送数据。此外，如果整个 512 字节范围 == 零，则将进行特殊处理，并被视为不连续。
我们正在处理 byte[] 类型，因此这里的高效列表可能会很复杂。我想避免不必要的数组复制和扩展。

有道理吗？（像泥一样，我希望不是）

到目前为止，我最接近的是编写方法签名：（我知道蹩脚）

// This can't be bigger than 4Mb, or smaller than 512
byte[] BigBlobToUpload = new byte[];

    /// <summary>
    /// Collects consecutive data ranges that will be uploaded
    /// </summary>
    /// <param name="NameOfTarget">The name of the target (unique per host)</param>
    /// <param name="range">The data to be collected, in 512K multiples</param>
    /// <param name="offsetToTransfer">The "start point" or offset of the data stored in range to be included. Almost always 0.</param>
    /// <param name="sizeToTransfer">The length, or end of the range to include.  Almost always 512.</param>
    /// <param name="cloudOffset">The location this data should be placed in the BigBlobToUpload  global var for eventual upload</param>

private void AddToQueue(string NameOfTarget, byte[] range, int offsetToTransfer, int sizeToTransfer, int cloudOffset)
{

}

我只需要有人给我一个如何有效跟踪这些事情的方向......我可以处理它从那里。即使是抽象的方向也会有帮助

有人可以让我了解我应该如何做的正确概念方向吗？跟踪，并有条件连接连续的数据范围？

更不用说我试图拥有仅在需要时扩展或复制数组的高效逻辑。

原文

I'll start by saying is going to be a little tougher than blindly joining byte[] together.

My big picture goal is to optimize an application that currently uploads many 512byte pages to a web server (Azure Page Blob), and reduce that to a single large upload of 4Megs or less. See the links at the bottom of this question for more background as to why.

The short answer to why: This optimization will increase speed (fewer IOs) and save money over the long term by using Azure sparse files

Now for the details:

The class will need to

Accept data and store it ( data as defined as alignment start, alignment stop, and the accompanying payload.
After N pieces of data arrive, or an event occurs, ProcessData(). This means it's time to assemble the data according to the boundaries (the stop value of blob1 must align with the start value of blob2)
Consecutive data may actually arrive out of order.
Non consecutive data is defined as when the calling app does not send it before processData() occurs. In addition, if the entire 512 byte range == Zero, then that gets special handing, and is treated as non-consecutive.
We're dealing with types of byte[], so efficient lists may be complicated here. I'd like to avoid unneeded copies and expansions of the array.

Make sense? (like mud, I hope not)

The closest I've come so far is writing the method signature: (lame I know)

// This can't be bigger than 4Mb, or smaller than 512
byte[] BigBlobToUpload = new byte[];

    /// <summary>
    /// Collects consecutive data ranges that will be uploaded
    /// </summary>
    /// <param name="NameOfTarget">The name of the target (unique per host)</param>
    /// <param name="range">The data to be collected, in 512K multiples</param>
    /// <param name="offsetToTransfer">The "start point" or offset of the data stored in range to be included. Almost always 0.</param>
    /// <param name="sizeToTransfer">The length, or end of the range to include.  Almost always 512.</param>
    /// <param name="cloudOffset">The location this data should be placed in the BigBlobToUpload  global var for eventual upload</param>

private void AddToQueue(string NameOfTarget, byte[] range, int offsetToTransfer, int sizeToTransfer, int cloudOffset)
{

}

I just need someone to give me a direction on how to track these things efficently ... I can handle it from there. Even an abstract direction would help