我可以在没有客户端缓冲的情况下将多个字节数组写入 HttpClient 吗?
问题
我想使用 Apache 的 HttpClient 类 (4.1.2) 将非常大的文件(最多 5 或 6 GB)上传到 Web 服务器。在发送这些文件之前,我将它们分成更小的块(例如 100 MB)。不幸的是,我看到的使用 HttpClient
执行多部分 POST 的所有示例似乎都会在发送文件内容之前对其进行缓冲(通常假设文件大小较小)。这是这样一个示例:
HttpClient httpclient = new DefaultHttpClient();
HttpPost post = new HttpPost("http://www.example.com/upload.php");
MultipartEntity mpe = new MultipartEntity();
// Here are some plain-text fields as a part of our multi-part upload
mpe.addPart("chunkIndex", new StringBody(Integer.toString(chunkIndex)));
mpe.addPart("fileName", new StringBody(somefile.getName()));
// Now for a file to include; looks like we're including the whole thing!
FileBody bin = new FileBody(new File("/path/to/myfile.bin"));
mpe.addPart("myFile", bin);
post.setEntity(mpe);
HttpResponse response = httpclient.execute(post);
在这个示例中,我们创建了一个新的 FileBody
对象并将其添加到 MultipartEntity
中。在我的例子中,文件大小可能为 100 MB,我不想立即缓冲所有数据。我希望能够以较小的块(例如,一次 4 MB)写出该数据,最终写入全部 100 MB。我可以使用 Java 中的 HTTPURLConnection
类(通过直接写入输出流)来完成此操作,但是该类有它自己的一系列问题,这就是为什么我尝试使用阿帕奇产品。
我的问题
是否可以将 100 MB 的数据写入 HttpClient,但以更小的迭代块的形式写入?我不希望客户端在实际执行 POST 之前必须缓冲最多 100 MB 的数据。我看到的示例似乎都不允许您直接写入输出流;它们似乎都在 execute()
调用之前预先打包了一些东西。
任何提示将不胜感激!
--- 更新 ---
为了澄清起见,以下是我之前对 HTTPURLConnection
类所做的操作。我试图弄清楚如何在 HttpClient
中执行类似的操作:
// Get the connection's output stream
out = new DataOutputStream(conn.getOutputStream());
// Write some plain-text multi-part data
out.writeBytes(fieldBuffer.toString());
// Figure out how many loops we'll need to write the 100 MB chunk
int bufferLoops = (dataLength + (bufferSize - 1)) / bufferSize;
// Open the local file (~5 GB in size) to read the data chunk (100 MB)
raf = new RandomAccessFile(file, "r");
raf.seek(startingOffset); // Position the pointer to the beginning of the chunk
// Keep track of how many bytes we have left to read for this chunk
int bytesLeftToRead = dataLength;
// Write the file data block to the output stream
for(int i=0; i<bufferLoops; i++)
{
// Create an appropriately sized mini-buffer (max 4 MB) for the pieces
// of this chunk we have yet to read
byte[] buffer = (bytesLeftToRead < bufferSize) ?
new byte[bytesLeftToRead] : new byte[bufferSize];
int bytes_read = raf.read(buffer); // Read ~4 MB from the local file
out.write(buffer, 0, bytes_read); // Write that bit to the stream
bytesLeftToRead -= bytes_read;
}
// Write the final boundary
out.writeBytes(finalBoundary);
out.flush();
The Problem
I would like to upload very large files (up to 5 or 6 GB) to a web server using the HttpClient
class (4.1.2) from Apache. Before sending these files, I break them into smaller chunks (100 MB, for example). Unfortunately, all of the examples I see for doing a multi-part POST using HttpClient
appear to buffer the file contents before sending them (typically, a small file size is assumed). Here is such an example:
HttpClient httpclient = new DefaultHttpClient();
HttpPost post = new HttpPost("http://www.example.com/upload.php");
MultipartEntity mpe = new MultipartEntity();
// Here are some plain-text fields as a part of our multi-part upload
mpe.addPart("chunkIndex", new StringBody(Integer.toString(chunkIndex)));
mpe.addPart("fileName", new StringBody(somefile.getName()));
// Now for a file to include; looks like we're including the whole thing!
FileBody bin = new FileBody(new File("/path/to/myfile.bin"));
mpe.addPart("myFile", bin);
post.setEntity(mpe);
HttpResponse response = httpclient.execute(post);
In this example, it looks like we create a new FileBody
object and add it to the MultipartEntity
. In my case, where the file could be 100 MB in size, I'd rather not buffer all of that data at once. I'd like to be able to write out that data in smaller chunks (4 MB at a time, for example), eventually writing all 100 MB. I'm able to do this using the HTTPURLConnection
class from Java (by writing directly to the output stream), but that class has its own set of problems, which is why I'm trying to use the Apache offerings.
My Question
Is it possible to write 100 MB of data to an HttpClient, but in smaller, iterative chunks? I don't want the client to have to buffer up to 100 MB of data before actually doing the POST. None of the examples I see seem to allow you to write directly to the output stream; they all appear to pre-package things before the execute()
call.
Any tips would be appreciated!
--- Update ---
For clarification, here's what I did previously with the HTTPURLConnection
class. I'm trying to figure out how to do something similar in HttpClient
:
// Get the connection's output stream
out = new DataOutputStream(conn.getOutputStream());
// Write some plain-text multi-part data
out.writeBytes(fieldBuffer.toString());
// Figure out how many loops we'll need to write the 100 MB chunk
int bufferLoops = (dataLength + (bufferSize - 1)) / bufferSize;
// Open the local file (~5 GB in size) to read the data chunk (100 MB)
raf = new RandomAccessFile(file, "r");
raf.seek(startingOffset); // Position the pointer to the beginning of the chunk
// Keep track of how many bytes we have left to read for this chunk
int bytesLeftToRead = dataLength;
// Write the file data block to the output stream
for(int i=0; i<bufferLoops; i++)
{
// Create an appropriately sized mini-buffer (max 4 MB) for the pieces
// of this chunk we have yet to read
byte[] buffer = (bytesLeftToRead < bufferSize) ?
new byte[bytesLeftToRead] : new byte[bufferSize];
int bytes_read = raf.read(buffer); // Read ~4 MB from the local file
out.write(buffer, 0, bytes_read); // Write that bit to the stream
bytesLeftToRead -= bytes_read;
}
// Write the final boundary
out.writeBytes(finalBoundary);
out.flush();
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
如果我正确理解你的问题,你关心的是将整个文件加载到内存中(对吗?)。如果是这种情况,您应该使用 Stream(例如 FileInputStream)。这样,整个文件就不会立即被拉入内存。
如果这没有帮助,并且您仍然想将文件分成块,您可以对服务器进行编码以处理多个 POSTS,在获取数据时连接数据,然后手动拆分文件的字节。
就我个人而言,我更喜欢我的第一个答案,但无论哪种方式(或者如果这些都没有帮助,则两种方式都没有),祝你好运!
If I'm understanding your question correctly, your concern is loading the whole file into memory (right?). If That is the case, you should employ Streams (such as a FileInputStream). That way, the whole file doesn't get pulled into memory at once.
If that doesn't help, and you still want to divide the file up into chunks, you could code the server to deal with multiple POSTS, concatenating the data as it gets them, and then manually split up the bytes of the file.
Personally, I prefer my first answer, but either way (or neither way if these don't help), Good luck!
流绝对是可行的方法,我记得不久前对一些更大的文件做了类似的事情,并且效果很好。
Streams are definitely the way to go, I remember doing something similar a while back with some bigger files and it worked perfectly.
您所需要做的就是将自定义内容生成逻辑包装到
HttpEntity
实现中。这将使您能够完全控制内容生成和内容流的过程。郑重声明:
HttpClient
附带的MultipartEntity
在将文件部分写入连接套接字之前不会在内存中缓冲文件部分。All you need is to wrap your custom content generation logic into
HttpEntity
implementation. This will give you a complete control over the process of content generation and content streaming.And for the record:
MultipartEntity
shipped withHttpClient
does not buffer file parts in memory prior to writing them out to the connection socket.