我可以在没有客户端缓冲的情况下将多个字节数组写入 HttpClient 吗？

发布于 2025-01-01 13:25:32 字数 2628 浏览 0 评论 0原文

问题

我想使用 Apache 的 HttpClient 类 (4.1.2) 将非常大的文件（最多 5 或 6 GB）上传到 Web 服务器。在发送这些文件之前，我将它们分成更小的块（例如 100 MB）。不幸的是，我看到的使用 HttpClient 执行多部分 POST 的所有示例似乎都会在发送文件内容之前对其进行缓冲（通常假设文件大小较小）。这是这样一个示例：

HttpClient httpclient = new DefaultHttpClient();
HttpPost post = new HttpPost("http://www.example.com/upload.php");

MultipartEntity mpe = new MultipartEntity();

// Here are some plain-text fields as a part of our multi-part upload
mpe.addPart("chunkIndex", new StringBody(Integer.toString(chunkIndex)));
mpe.addPart("fileName", new StringBody(somefile.getName()));

// Now for a file to include; looks like we're including the whole thing!
FileBody bin = new FileBody(new File("/path/to/myfile.bin"));
mpe.addPart("myFile", bin);

post.setEntity(mpe);
HttpResponse response = httpclient.execute(post);

在这个示例中，我们创建了一个新的 FileBody 对象并将其添加到 MultipartEntity 中。在我的例子中，文件大小可能为 100 MB，我不想立即缓冲所有数据。我希望能够以较小的块（例如，一次 4 MB）写出该数据，最终写入全部 100 MB。我可以使用 Java 中的 HTTPURLConnection 类（通过直接写入输出流）来完成此操作，但是该类有它自己的一系列问题，这就是为什么我尝试使用阿帕奇产品。

我的问题

是否可以将 100 MB 的数据写入 HttpClient，但以更小的迭代块的形式写入？我不希望客户端在实际执行 POST 之前必须缓冲最多 100 MB 的数据。我看到的示例似乎都不允许您直接写入输出流；它们似乎都在 execute() 调用之前预先打包了一些东西。

任何提示将不胜感激！

--- 更新 ---

为了澄清起见，以下是我之前对 HTTPURLConnection 类所做的操作。我试图弄清楚如何在 HttpClient 中执行类似的操作：

// Get the connection's output stream
out = new DataOutputStream(conn.getOutputStream());

// Write some plain-text multi-part data
out.writeBytes(fieldBuffer.toString());

// Figure out how many loops we'll need to write the 100 MB chunk
int bufferLoops = (dataLength + (bufferSize - 1)) / bufferSize;

// Open the local file (~5 GB in size) to read the data chunk (100 MB)
raf = new RandomAccessFile(file, "r");
raf.seek(startingOffset); // Position the pointer to the beginning of the chunk

// Keep track of how many bytes we have left to read for this chunk
int bytesLeftToRead = dataLength;

// Write the file data block to the output stream
for(int i=0; i<bufferLoops; i++)
{
    // Create an appropriately sized mini-buffer (max 4 MB) for the pieces
    // of this chunk we have yet to read
    byte[] buffer = (bytesLeftToRead < bufferSize) ? 
                    new byte[bytesLeftToRead] : new byte[bufferSize];

    int bytes_read = raf.read(buffer); // Read ~4 MB from the local file
    out.write(buffer, 0, bytes_read); // Write that bit to the stream
    bytesLeftToRead -= bytes_read;
}

// Write the final boundary
out.writeBytes(finalBoundary);
out.flush();

原文

The Problem

I would like to upload very large files (up to 5 or 6 GB) to a web server using the HttpClient class (4.1.2) from Apache. Before sending these files, I break them into smaller chunks (100 MB, for example). Unfortunately, all of the examples I see for doing a multi-part POST using HttpClient appear to buffer the file contents before sending them (typically, a small file size is assumed). Here is such an example:

HttpClient httpclient = new DefaultHttpClient();
HttpPost post = new HttpPost("http://www.example.com/upload.php");

MultipartEntity mpe = new MultipartEntity();

// Here are some plain-text fields as a part of our multi-part upload
mpe.addPart("chunkIndex", new StringBody(Integer.toString(chunkIndex)));
mpe.addPart("fileName", new StringBody(somefile.getName()));

// Now for a file to include; looks like we're including the whole thing!
FileBody bin = new FileBody(new File("/path/to/myfile.bin"));
mpe.addPart("myFile", bin);

post.setEntity(mpe);
HttpResponse response = httpclient.execute(post);

In this example, it looks like we create a new FileBody object and add it to the MultipartEntity. In my case, where the file could be 100 MB in size, I'd rather not buffer all of that data at once. I'd like to be able to write out that data in smaller chunks (4 MB at a time, for example), eventually writing all 100 MB. I'm able to do this using the HTTPURLConnection class from Java (by writing directly to the output stream), but that class has its own set of problems, which is why I'm trying to use the Apache offerings.

My Question

Is it possible to write 100 MB of data to an HttpClient, but in smaller, iterative chunks? I don't want the client to have to buffer up to 100 MB of data before actually doing the POST. None of the examples I see seem to allow you to write directly to the output stream; they all appear to pre-package things before the execute() call.

Any tips would be appreciated!

--- Update ---

For clarification, here's what I did previously with the HTTPURLConnection class. I'm trying to figure out how to do something similar in HttpClient:

// Get the connection's output stream
out = new DataOutputStream(conn.getOutputStream());

// Write some plain-text multi-part data
out.writeBytes(fieldBuffer.toString());

// Figure out how many loops we'll need to write the 100 MB chunk
int bufferLoops = (dataLength + (bufferSize - 1)) / bufferSize;

// Open the local file (~5 GB in size) to read the data chunk (100 MB)
raf = new RandomAccessFile(file, "r");
raf.seek(startingOffset); // Position the pointer to the beginning of the chunk

// Keep track of how many bytes we have left to read for this chunk
int bytesLeftToRead = dataLength;

// Write the file data block to the output stream
for(int i=0; i<bufferLoops; i++)
{
    // Create an appropriately sized mini-buffer (max 4 MB) for the pieces
    // of this chunk we have yet to read
    byte[] buffer = (bytesLeftToRead < bufferSize) ? 
                    new byte[bytesLeftToRead] : new byte[bufferSize];

    int bytes_read = raf.read(buffer); // Read ~4 MB from the local file
    out.write(buffer, 0, bytes_read); // Write that bit to the stream
    bytesLeftToRead -= bytes_read;
}

// Write the final boundary
out.writeBytes(finalBoundary);
out.flush();

分享到QQ

分享到微博