如何使用 HttpModule 删除标头信息来上传文件？

发布于 2024-07-23 11:43:11 字数 2628 浏览 6 评论 0原文

我在 ASP.NET 中创建了一个 HttpModule 来允许用户上传大文件。我在网上找到了一些示例代码，可以根据我的需要进行调整。如果文件是多部分消息，我会抓取该文件，然后将字节分块并将其写入磁盘。

问题是文件总是损坏的。经过一些研究后发现，由于某种原因，我收到的字节的第一部分应用了 HTTP 标头或消息正文标记。我似乎不知道如何解析这些字节，所以我只获取文件。

额外的数据/垃圾被添加到文件的顶部，如下所示：

-----------------------8cbb435d6837a3f
Content-Disposition: form-data; name="file"; filename="test.txt"
Content-Type: application/octet-stream

这种标头信息当然会损坏我正在接收的文件，因此我需要在写入字节之前将其删除。

这是我编写的用于处理上传的代码：

public class FileUploadManager : IHttpModule
{
    public int BUFFER_SIZE = 1024;

    protected void app_BeginRequest(object sender, EventArgs e)
    {
        // get the context we are working under
        HttpContext context = ((HttpApplication)sender).Context;

        // make sure this is multi-part data
        if (context.Request.ContentType.IndexOf("multipart/form-data") == -1)
        {
            return;
        }

        IServiceProvider provider = (IServiceProvider)context;
        HttpWorkerRequest wr = 
        (HttpWorkerRequest)provider.GetService(typeof(HttpWorkerRequest));

        // only process this file if it has a body and is not already preloaded
        if (wr.HasEntityBody() && !wr.IsEntireEntityBodyIsPreloaded())
        {
            // get the total length of the body
            int iRequestLength = wr.GetTotalEntityBodyLength();

            // get the initial bytes loaded
            int iReceivedBytes = wr.GetPreloadedEntityBodyLength();

            // open file stream to write bytes to
            using (System.IO.FileStream fs = 
            new System.IO.FileStream(
               @"C:\tempfiles\test.txt", 
               System.IO.FileMode.CreateNew))
            {
                // *** NOTE: This is where I think I need to filter the bytes 
                // received to get rid of the junk data but I am unsure how to 
                // do this?

                int bytesRead = BUFFER_SIZE;
                // Create an input buffer to store the incomming data 
                byte[] byteBuffer = new byte[BUFFER_SIZE];
                while ((iRequestLength - iReceivedBytes) >= bytesRead)
                {
                    // read the next chunk of the file
                    bytesRead = wr.ReadEntityBody(byteBuffer, byteBuffer.Length);
                    fs.Write(byteBuffer, 0, byteBuffer.Length);
                    iReceivedBytes += bytesRead;

                    // write bytes so far of file to disk
                    fs.Flush();
                }
            }
        }
    }
}

我如何检测并解析出此标头垃圾信息以便仅隔离文件位？

原文

I've created an HttpModule in ASP.NET to allow users to upload large files. I found some sample code online that I was able to adapt for my needs. I grab the file if it is a multi-part message and then I chunk the bytes and write them to disk.

The problem is that the file is always corrupt. After doing some research, it turns out that for some reason there is HTTP header or message body tags applied to the first part of the bytes I receive. I can't seem to figure out how to parse out those bytes so I only get the file.

Extra data / junk is prepended to the top of the file such as this:

-----------------------8cbb435d6837a3f
Content-Disposition: form-data; name="file"; filename="test.txt"
Content-Type: application/octet-stream

This kind of header information of course corrupts the file I am receiving so I need to get rid of it before I write the bytes.

Here is the code I wrote to handle the upload:

public class FileUploadManager : IHttpModule
{
    public int BUFFER_SIZE = 1024;

    protected void app_BeginRequest(object sender, EventArgs e)
    {
        // get the context we are working under
        HttpContext context = ((HttpApplication)sender).Context;

        // make sure this is multi-part data
        if (context.Request.ContentType.IndexOf("multipart/form-data") == -1)
        {
            return;
        }

        IServiceProvider provider = (IServiceProvider)context;
        HttpWorkerRequest wr = 
        (HttpWorkerRequest)provider.GetService(typeof(HttpWorkerRequest));

        // only process this file if it has a body and is not already preloaded
        if (wr.HasEntityBody() && !wr.IsEntireEntityBodyIsPreloaded())
        {
            // get the total length of the body
            int iRequestLength = wr.GetTotalEntityBodyLength();

            // get the initial bytes loaded
            int iReceivedBytes = wr.GetPreloadedEntityBodyLength();

            // open file stream to write bytes to
            using (System.IO.FileStream fs = 
            new System.IO.FileStream(
               @"C:\tempfiles\test.txt", 
               System.IO.FileMode.CreateNew))
            {
                // *** NOTE: This is where I think I need to filter the bytes 
                // received to get rid of the junk data but I am unsure how to 
                // do this?

                int bytesRead = BUFFER_SIZE;
                // Create an input buffer to store the incomming data 
                byte[] byteBuffer = new byte[BUFFER_SIZE];
                while ((iRequestLength - iReceivedBytes) >= bytesRead)
                {
                    // read the next chunk of the file
                    bytesRead = wr.ReadEntityBody(byteBuffer, byteBuffer.Length);
                    fs.Write(byteBuffer, 0, byteBuffer.Length);
                    iReceivedBytes += bytesRead;

                    // write bytes so far of file to disk
                    fs.Flush();
                }
            }
        }
    }
}

How would I detect and parse out this header junk information in order to isolate just the file bits?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

晨光如昨 2024-07-30 11:43:11

使用InputSteramEntity类如下：

 InputStreamEntity reqEntity = new InputStreamEntity(new FileInputStream(filePath), -1);
 reqEntity.setContentType("binary/octet-stream");
 httppost.setEntity(reqEntity);
 HttpResponse response = httpclient.execute(httppost);

如果你像上面那样使用，它不会在服务器的header和trailer以及content-disposition、content-type中添加token

------------------- ----8cbb435d6837a3f
内容处置：表单数据；名称=“文件”；文件名=“测试.txt”
内容类型：应用程序/八位字节流

----------------------------------8cbb435d6837a3f

use InputSteramEntity class as follows:

 InputStreamEntity reqEntity = new InputStreamEntity(new FileInputStream(filePath), -1);
 reqEntity.setContentType("binary/octet-stream");
 httppost.setEntity(reqEntity);
 HttpResponse response = httpclient.execute(httppost);

If you use like above, it will not add token to header and trailer and content-disposition, content-type at server

-----------------------8cbb435d6837a3f
Content-Disposition: form-data; name="file"; filename="test.txt"
Content-Type: application/octet-stream

-----------------------8cbb435d6837a3f

回复收藏 0 原文

夕嗳→ 2024-07-30 11:43:11

您遇到的是用于分隔 HTTP 请求各个部分的边界。请求的开头应该有一个名为 Content-type 的标头，在该标头中，有一个如下所示的边界声明：

Content-Type: multipart/mixed;boundary=gc0p4Jq0M2Yt08jU534c0p

一旦找到此边界，只需在边界上分割您的请求，并在其前面添加两个连字符 (--)它。换句话说，将您的内容拆分为：

"--"+Headers.Get("Content-Type").Split("boundary=")[1]

那里有一些伪代码，但它应该能够表达要点。这应该将多部分表单数据划分为适当的部分。

有关详细信息，请参阅 RFC1341

值得注意的是，显然最终的边界有两个连字符也附加到边界的末尾。

编辑：好的，所以您遇到的问题是您没有将表单数据分解为必要的组件。多部分/表单数据请求的每个部分都可以单独视为单独的请求（意味着它们可以包含标头）。您可能应该做的是将字节读入字符串：

string formData = Encoding.ASCII.GetString(byteBuffer);

根据边界拆分为多个字符串：

string boundary = "\r\n"+context.Request.ContentType.Split("boundary=")[1];
string[] parts = Regex.Split( formData, boundary );

循环遍历每个字符串，将标题与内容分开。由于您实际上想要内容的字节值，因此请跟踪数据偏移量，因为从 ASCII 转换回字节可能无法正常工作（我可能是错的，但我很偏执）：

int dataOffset = 0;
for( int i=0; i < parts.Length; i++ ){
    string header = part.Substring( 0, part.IndexOf( "\r\n\r\n" ) );
    dataOffset += boundary.Length + header.Length + 4;
    string asciiBody = part.Substring( part.IndexOf( "\r\n\r\n" ) + 4 );
    byte[] body = new byte[ asciiBody.Length ];

    for( int j=dataOffset,k=0; j < asciiBody.Length; j++ ){
        body[k++] = byteBuffer[j];
    }

    // body now contains your binary data
}

注意：这是未经测试的，因此可能需要一些调整。

What you're running into is the boundary used to separate the various parts of the HTTP request. There should be a header at the beginning of the request called Content-type, and within that header, there's a boundary statement like so:

Content-Type: multipart/mixed;boundary=gc0p4Jq0M2Yt08jU534c0p

Once you find this boundary, simply split your request on the boundary with two hyphens (--) prepended to it. In other words, split your content on:

"--"+Headers.Get("Content-Type").Split("boundary=")[1]

Sorta pseudo-code there, but it should get the point across. This should divide the multipart form data into the appropriate sections.

For more info, see RFC1341

It's worth noting, apparently the final boundary has two hyphens appended to the end of the boundary as well.

EDIT: Okay, so the problem you're running into is that you're not breaking the form data into the necessary components. The sections of a multipart/form-data request can each individually be treated as separate requests (meaning they can contain headers). What you should probably do is read the bytes into a string:

string formData = Encoding.ASCII.GetString(byteBuffer);

split into multiple strings based on the boundary:

string boundary = "\r\n"+context.Request.ContentType.Split("boundary=")[1];
string[] parts = Regex.Split( formData, boundary );

loop through each string, separating headers from content. Since you actually want the byte value of the content, keep track of the data offset since converting from ASCII back to byte might not work properly (I could be wrong, but I'm paranoid):

int dataOffset = 0;
for( int i=0; i < parts.Length; i++ ){
    string header = part.Substring( 0, part.IndexOf( "\r\n\r\n" ) );
    dataOffset += boundary.Length + header.Length + 4;
    string asciiBody = part.Substring( part.IndexOf( "\r\n\r\n" ) + 4 );
    byte[] body = new byte[ asciiBody.Length ];

    for( int j=dataOffset,k=0; j < asciiBody.Length; j++ ){
        body[k++] = byteBuffer[j];
    }

    // body now contains your binary data
}

NOTE: This is untested, so it may require some tweaking.

回复收藏 0 原文

~没有更多了~