JAVA - 从网络服务器下载二进制文件(例如 PDF)文件
我需要从网络服务器下载 pdf 文件到我的电脑并保存在本地。
我使用 Httpclient 连接到网络服务器并获取内容正文:
HttpEntity entity=response.getEntity();
InputStream in=entity.getContent();
String stream = CharStreams.toString(new InputStreamReader(in));
int size=stream.length();
System.out.println("stringa html page LENGTH:"+stream.length());
System.out.println(stream);
SaveToFile(stream);
然后我将内容保存在文件中:
//check CRLF (i don't know if i need to to this)
String[] fix=stream.split("\r\n");
File file=new File("C:\\Users\\augusto\\Desktop\\progetti web\\test\\test2.pdf");
PrintWriter out = new PrintWriter(new FileWriter(file));
for (int i = 0; i < fix.length; i++) {
out.print(fix[i]);
out.print("\n");
}
out.close();
我还尝试直接将字符串内容保存到文件中:
OutputStream out=new FileOutputStream("pathPdfFile");
out.write(stream.getBytes());
out.close();
但结果始终相同: 我可以打开 pdf 文件,但我可以看到仅白页。错误是否与 pdf 流和 endstream 字符集编码有关? Stream 和 endStream 之间的 pdf 内容是否需要以其他方式进行操作?
希望这有助于避免对我想要做的事情产生一些误解:
这是我的登录名(完美运行):
public static void postForm(){
String cookie="";
try {
System.out.println("POSTFORM ###################################");
String postURL = "http://login.libero.it/logincheck.php";
HttpPost post = new HttpPost(postURL);
post.setHeader("User-Agent", "Chrome/14.0.835.202");
post.setHeader("Referer","http://login.libero.it/?layout=m&service_id=m_mail&ret_url=http://m.mailbeta.libero.it/m/wmm/auth/check");
if(cookieVector.size()>0){
for(int i=0;i<cookieVector.size();i++){
cookie=cookie+cookieVector.elementAt(i).toString().replace("Set-Cookie:", "")+";";
}
post.setHeader("Cookie",cookie);
}
//System.out.println("sequenza cookie post:"+cookie);
List<NameValuePair> params = new ArrayList<NameValuePair>();
params.add(new BasicNameValuePair("SERVICE_ID", "m_mail"));
params.add(new BasicNameValuePair("LAYOUT", "m"));
params.add(new BasicNameValuePair("DEVICE", ""));
params.add(new BasicNameValuePair("RET_URL","http://m.mailbeta.libero.it/m/wmm/auth/check"));
params.add(new BasicNameValuePair("LOGINID", "secret"));
params.add(new BasicNameValuePair("PASSWORD", "secret"));
UrlEncodedFormEntity ent = new UrlEncodedFormEntity(params,HTTP.UTF_8);
System.out.println("stringa urlPost:"+ent.toString());
post.setEntity(ent);
HttpResponse responsePOST = client.execute(post);
System.out.println("Response postForm: " + responsePOST.getStatusLine());
Header[] allHeaders = responsePOST.getAllHeaders();
String location = "";
for (Header header : allHeaders) {
if("location".equalsIgnoreCase(header.getName())) location = header.getValue();
responsePOST.addHeader(header.getName(), header.getValue());
}
cookieVector.clear();
Header[] headerx=responsePOST.getHeaders("Set-Cookie");
System.out.println("array header:"+headerx.length);
for(int i=0;i<headerx.length;i++){
System.out.println("restituito cookie POST:"+headerx[i].getValue());
cookieVector.add(headerx[i]);
//System.out.println("cookie trovato POST:"+cookieVector.elementAt(i));
}
//System.out.println("inseriti"+cookieVector.size()+""+"elements");
//HttpEntity resEntity = responsePOST.getEntity();
// populate redirect information in response
//CONTROLLO ESITO LOGIN
if(location.contains("https://login.libero.it/logincheck.php")){
loginError=1;
}
System.out.println("Redirecting to: " + location);
//EntityUtils.consume(resEntity);
responsePOST.getEntity().consumeContent();
System.out.println("torno a GET:"+"url:"+location+"cookieVector size:"+cookieVector.size());
get(location,"http://login.libero.it/logincheck.php");
} catch (IOException ex) {
Logger.getLogger(LiberoLoginNew.class.getName()).log(Level.SEVERE, null, ex);
}
}
登录后我就可以访问文件的链接(pdf、图像、文档等)。在本例中,我们以 pdf 文件为例:
public static void httpConnection(String url,String referer,String cookieAuth){
try {
String location="";
String cookie="";
HttpResponse response;
HttpGet get;
HttpEntity respEntity;
Referer=referer;
System.out.println("HTTPCONNECTION ################################");
System.out.println("connessione a:"+url+"............");
get = new HttpGet(url);
if(referer.length()>0){
//httpget.setHeader("Referer",referer );
}
if(attachmentURL.size()==0){
get.setHeader("User-Agent", "Chrome/14.0.835.202");
}else{
get.setHeader("Accept-charset", "UTF-8");
get.setHeader("Content-type", "application/pdf");
}
if(cookieVector.size()>0){
System.out.println("iserisco cookie da vector");
for(int i=0;i<cookieVector.size();i++){
cookie=cookie+cookieVector.elementAt(i).toString().replace("Set-Cookie:", "")+";";
}
get.setHeader("Cookie", cookie);
}else if(cookieAuth.length()>0){
System.out.println("inserisco cookieAuth....");
System.out.println("valore cookieSession:"+cookieAuth);
get.setHeader("Cookie",cookieAuth.replace("Set-Cookie:", "")+";");
}
response = client.execute(get);
cookieVector.clear();//reset cookie
System.out.println("home get: " + response.getStatusLine());
Header[] headery=response.getAllHeaders();
for(int j=0;j<headery.length;j++){
System.out.println(headery[j].getName()+" "+" VALUE:"+" "+headery[j].getValue());
}
Header[] headerx=response.getHeaders("Set-Cookie");
System.out.println("array header:"+headerx.length);
System.out.print("httpconnection SERVER HEADERS ###############");
for(int i=0;i<headerx.length;i++){
if("location".equalsIgnoreCase(headerx[i].getName())){
location = headerx[i].getValue();
//ResponseGET.addHeader(headerx[i].getName(), header.getValue());
}
//System.out.println(headerx[i].getValue());
cookieVector.add(headerx[i]);
}
//STREAM CONTENT BODY
HttpEntity entity2=response.getEntity();
InputStream in=entity2.getContent(); <==THIS IS THE WAY I GET STREAM RESPONSE
if(attachmentURL.size()>0){
saveAttachment(in);//SAVE FILE <==
}else{
from(in,htmlpage);//Parse and grab: message title,subject,attachments. If attachments are found then come back here and execute the method saveAttachment.
in.close();
}
} catch (IOException ex) {
Logger.getLogger(LiberoLoginNew.class.getName()).log(Level.SEVERE, null, ex);
}
}
方法 httpConnection 有效,我可以下载该文件!
服务器响应:
Date VALUE: Fri, 18 Nov 2011 13:09:46 GMT
Server VALUE: Apache/2.2.21 (Unix) mod_jk/1.2.23
Set-Cookie VALUE: MST_PVP=tiQZO3nbl9_5f_OQXtJP32YiqQx_5f_kSh6F6Io7r3xS; Domain=m.libero.it; Path=/
Content-Type VALUE: application/octet-stream
Expires VALUE: Fri, 18 Nov 2011 15:09:46 GMT
Transfer-Encoding VALUE: chunked
响应正文示例:
%PDF-1.7
1 0 obj % entry point
<<
/Type /Catalog
/Pages 2 0 R
> 结束对象
2 0 obj
<<
/Type /Pages
/MediaBox [ 0 0 200 200 ]
/Count 1
/Kids [ 3 0 R ]
>>
endobj
3 0 obj
<<
/Type /Page
/Parent 2 0 R
/Resources <<
/Font <<
/F1 4 0 R
>>
>>
/Contents 5 0 R
>>
endobj
4 0 obj
<<
/Type /Font
/Subtype /Type1
/BaseFont /Times-Roman
>>
endobj
5 0 obj % page content
<<
/Length 44
>>
stream
BT
70 50 TD
/F1 12 Tf
(Hello, world!) Tj
ET
endstream
endobj
xref
0 6
0000000000 65535 f
0000000010 00000 n
0000000079 00000 n
0000000173 00000 n
0000000301 00000 n
0000000380 00000 n
trailer
<<
/Size 6
/Root 1 0 R
>>
startxref
492
%%EOF
现在,让我们从这里开始。 您能告诉我必须做什么才能将流保存在文件中吗?
########### SOLVED:要从流数据本地保存文件,尊重二进制数据的性质,我这样做:
public void saveFile(InputStream is){
try {
DataOutputStream out = new DataOutputStream(new BufferedOutputStream(new FileOutputStream(new File("test.pdf"))));
int c;
while((c = is.read()) != -1) {
out.writeByte(c);
}
out.close();
is.close();
}catch(IOException e) {
System.err.println("Error Writing/Reading Streams.");
}
}
如果您想要更有效的方法,您可以使用 java.IOUtils 并执行以下操作:
public void saveFile(InputStream is){
OutputStream os=new FileOutputStream(new File("test.pdf"));
byte[] bytes = IOUtils.toByteArray(is);
os.write(bytes);
os.close();
}
I need to download a pdf file from a webserver to my pc and save it locally.
I used Httpclient to connect to webserver and get the content body:
HttpEntity entity=response.getEntity();
InputStream in=entity.getContent();
String stream = CharStreams.toString(new InputStreamReader(in));
int size=stream.length();
System.out.println("stringa html page LENGTH:"+stream.length());
System.out.println(stream);
SaveToFile(stream);
Then i save content in a file:
//check CRLF (i don't know if i need to to this)
String[] fix=stream.split("\r\n");
File file=new File("C:\\Users\\augusto\\Desktop\\progetti web\\test\\test2.pdf");
PrintWriter out = new PrintWriter(new FileWriter(file));
for (int i = 0; i < fix.length; i++) {
out.print(fix[i]);
out.print("\n");
}
out.close();
I also tried to save a String content to file directly:
OutputStream out=new FileOutputStream("pathPdfFile");
out.write(stream.getBytes());
out.close();
But the result is always the same: I can open pdf file but i can see white pages only. Does the mistake is around pdf stream and endstream charset encoding? Does pdf content between stream and endStream need to be manipulate in some others way?
Hope this helps to avoid some misunderstanding about what i want to do:
This is my login (works perfectly):
public static void postForm(){
String cookie="";
try {
System.out.println("POSTFORM ###################################");
String postURL = "http://login.libero.it/logincheck.php";
HttpPost post = new HttpPost(postURL);
post.setHeader("User-Agent", "Chrome/14.0.835.202");
post.setHeader("Referer","http://login.libero.it/?layout=m&service_id=m_mail&ret_url=http://m.mailbeta.libero.it/m/wmm/auth/check");
if(cookieVector.size()>0){
for(int i=0;i<cookieVector.size();i++){
cookie=cookie+cookieVector.elementAt(i).toString().replace("Set-Cookie:", "")+";";
}
post.setHeader("Cookie",cookie);
}
//System.out.println("sequenza cookie post:"+cookie);
List<NameValuePair> params = new ArrayList<NameValuePair>();
params.add(new BasicNameValuePair("SERVICE_ID", "m_mail"));
params.add(new BasicNameValuePair("LAYOUT", "m"));
params.add(new BasicNameValuePair("DEVICE", ""));
params.add(new BasicNameValuePair("RET_URL","http://m.mailbeta.libero.it/m/wmm/auth/check"));
params.add(new BasicNameValuePair("LOGINID", "secret"));
params.add(new BasicNameValuePair("PASSWORD", "secret"));
UrlEncodedFormEntity ent = new UrlEncodedFormEntity(params,HTTP.UTF_8);
System.out.println("stringa urlPost:"+ent.toString());
post.setEntity(ent);
HttpResponse responsePOST = client.execute(post);
System.out.println("Response postForm: " + responsePOST.getStatusLine());
Header[] allHeaders = responsePOST.getAllHeaders();
String location = "";
for (Header header : allHeaders) {
if("location".equalsIgnoreCase(header.getName())) location = header.getValue();
responsePOST.addHeader(header.getName(), header.getValue());
}
cookieVector.clear();
Header[] headerx=responsePOST.getHeaders("Set-Cookie");
System.out.println("array header:"+headerx.length);
for(int i=0;i<headerx.length;i++){
System.out.println("restituito cookie POST:"+headerx[i].getValue());
cookieVector.add(headerx[i]);
//System.out.println("cookie trovato POST:"+cookieVector.elementAt(i));
}
//System.out.println("inseriti"+cookieVector.size()+""+"elements");
//HttpEntity resEntity = responsePOST.getEntity();
// populate redirect information in response
//CONTROLLO ESITO LOGIN
if(location.contains("https://login.libero.it/logincheck.php")){
loginError=1;
}
System.out.println("Redirecting to: " + location);
//EntityUtils.consume(resEntity);
responsePOST.getEntity().consumeContent();
System.out.println("torno a GET:"+"url:"+location+"cookieVector size:"+cookieVector.size());
get(location,"http://login.libero.it/logincheck.php");
} catch (IOException ex) {
Logger.getLogger(LiberoLoginNew.class.getName()).log(Level.SEVERE, null, ex);
}
}
Once logged i'm able to access to the file's link (pdf,image,doc, exc.). In this case we take for example a pdf file:
public static void httpConnection(String url,String referer,String cookieAuth){
try {
String location="";
String cookie="";
HttpResponse response;
HttpGet get;
HttpEntity respEntity;
Referer=referer;
System.out.println("HTTPCONNECTION ################################");
System.out.println("connessione a:"+url+"............");
get = new HttpGet(url);
if(referer.length()>0){
//httpget.setHeader("Referer",referer );
}
if(attachmentURL.size()==0){
get.setHeader("User-Agent", "Chrome/14.0.835.202");
}else{
get.setHeader("Accept-charset", "UTF-8");
get.setHeader("Content-type", "application/pdf");
}
if(cookieVector.size()>0){
System.out.println("iserisco cookie da vector");
for(int i=0;i<cookieVector.size();i++){
cookie=cookie+cookieVector.elementAt(i).toString().replace("Set-Cookie:", "")+";";
}
get.setHeader("Cookie", cookie);
}else if(cookieAuth.length()>0){
System.out.println("inserisco cookieAuth....");
System.out.println("valore cookieSession:"+cookieAuth);
get.setHeader("Cookie",cookieAuth.replace("Set-Cookie:", "")+";");
}
response = client.execute(get);
cookieVector.clear();//reset cookie
System.out.println("home get: " + response.getStatusLine());
Header[] headery=response.getAllHeaders();
for(int j=0;j<headery.length;j++){
System.out.println(headery[j].getName()+" "+" VALUE:"+" "+headery[j].getValue());
}
Header[] headerx=response.getHeaders("Set-Cookie");
System.out.println("array header:"+headerx.length);
System.out.print("httpconnection SERVER HEADERS ###############");
for(int i=0;i<headerx.length;i++){
if("location".equalsIgnoreCase(headerx[i].getName())){
location = headerx[i].getValue();
//ResponseGET.addHeader(headerx[i].getName(), header.getValue());
}
//System.out.println(headerx[i].getValue());
cookieVector.add(headerx[i]);
}
//STREAM CONTENT BODY
HttpEntity entity2=response.getEntity();
InputStream in=entity2.getContent(); <==THIS IS THE WAY I GET STREAM RESPONSE
if(attachmentURL.size()>0){
saveAttachment(in);//SAVE FILE <==
}else{
from(in,htmlpage);//Parse and grab: message title,subject,attachments. If attachments are found then come back here and execute the method saveAttachment.
in.close();
}
} catch (IOException ex) {
Logger.getLogger(LiberoLoginNew.class.getName()).log(Level.SEVERE, null, ex);
}
}
Method httpConnection works and i can download the file!!
Server Response:
Date VALUE: Fri, 18 Nov 2011 13:09:46 GMT
Server VALUE: Apache/2.2.21 (Unix) mod_jk/1.2.23
Set-Cookie VALUE: MST_PVP=tiQZO3nbl9_5f_OQXtJP32YiqQx_5f_kSh6F6Io7r3xS; Domain=m.libero.it; Path=/
Content-Type VALUE: application/octet-stream
Expires VALUE: Fri, 18 Nov 2011 15:09:46 GMT
Transfer-Encoding VALUE: chunked
Example of response body:
%PDF-1.7
1 0 obj % entry point
<<
/Type /Catalog
/Pages 2 0 R
>
endobj
2 0 obj
<<
/Type /Pages
/MediaBox [ 0 0 200 200 ]
/Count 1
/Kids [ 3 0 R ]
>>
endobj
3 0 obj
<<
/Type /Page
/Parent 2 0 R
/Resources <<
/Font <<
/F1 4 0 R
>>
>>
/Contents 5 0 R
>>
endobj
4 0 obj
<<
/Type /Font
/Subtype /Type1
/BaseFont /Times-Roman
>>
endobj
5 0 obj % page content
<<
/Length 44
>>
stream
BT
70 50 TD
/F1 12 Tf
(Hello, world!) Tj
ET
endstream
endobj
xref
0 6
0000000000 65535 f
0000000010 00000 n
0000000079 00000 n
0000000173 00000 n
0000000301 00000 n
0000000380 00000 n
trailer
<<
/Size 6
/Root 1 0 R
>>
startxref
492
%%EOF
Now,let starts from here.
Can you,please, tell me what i have to do to save the stream in a file?
########### SOLVED:
To save a file locally from the Stream data, respecting the binary data nature, i did like this:
public void saveFile(InputStream is){
try {
DataOutputStream out = new DataOutputStream(new BufferedOutputStream(new FileOutputStream(new File("test.pdf"))));
int c;
while((c = is.read()) != -1) {
out.writeByte(c);
}
out.close();
is.close();
}catch(IOException e) {
System.err.println("Error Writing/Reading Streams.");
}
}
If you want a more efficent method you can use java.IOUtils and do like this:
public void saveFile(InputStream is){
OutputStream os=new FileOutputStream(new File("test.pdf"));
byte[] bytes = IOUtils.toByteArray(is);
os.write(bytes);
os.close();
}
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
永远不要将二进制数据存储到
字符串
中。永远不要将
PrintWriter
用于二进制数据。永远不要逐行写入二进制文件。
我不想严厉或无礼,但这三个永远不必在您的脑海中扎根! :)
您可以参阅此页面了解如何下载二进制文件。我不喜欢这个例子,因为它将整个文档缓存在内存中(如果它的大小是 5GB 会发生什么?),但你可以从这里开始。 :)
Never store binary data into a
String
.Never use
PrintWriter
for binary data.Never write binary files line by line.
I don't want to be harsh or impolite but these three never's have to take roots in your mind! :)
You can see this page for an example on how to download a binary file. I don't like this example because it caches the whole document in memory (what happens if its size is 5GB?) but you can start from this. :)
使用 Apache FileUtils。我用一个小的 PDF 和一个 60 兆的 JAR 尝试过。效果很好!
Use apache FileUtils. I tried it with a small PDF and a JAR that was 60 meg. Works great!
你不能直接拿链接吗?
can't you just take the link?
让 jsoup 完成将响应下载为字节的艰苦工作。
使用 apache commons FileUtil 写入字节。
Let jsoup do the hard work for downloading response as bytes.
Write the bytes using apache commons FileUtil.