Java URL - Google 翻译请求返回 403 错误?
我正在制作一个 Java 控制台应用程序,需要向 Google Translate 发送 HTTP 请求以从上述站点获取翻译。
我的问题是,当我尝试使用 openStream()
从有效 URL 读取数据时,收到 403 错误。
使用 Translator t = new Translator(); 创建此 Translator 类的实例并调用 t.translate("en", "ja", "cheese");,例如,应返回程序在页面 http://translate.google.com 上找到的翻译/#en|ja|奶酪,它似乎是这样,但它捕获了 IOException 并返回:
http://translate.google.com/#en|ja|cheese 服务器返回 HTTP 响应代码:403,网址:http://translate.google.com/#en |ja|cheese
创建有效 Google 翻译 URL 的任何其他参数都会出现类似的错误。
403 错误 显然意味着我被拒绝许可。这就是我想知道的。为什么我无法访问此页面,我必须做什么才能访问它?
我在网络浏览器中访问了该网站,并输入了我的程序尝试手动访问的地址,但它有效;我不确定为什么我的程序无法访问该页面?将地址键入或复制/粘贴到我的 FireFox 导航栏中即可;看,如果这是正确的,那么该网站可能希望我通过以下方式访问该页面另一个页面上的链接?如果这是我必须做的,我该怎么做?
这是代码,我认为它可能会有所帮助..当我尝试从 translationURL.openStream()
返回的 InputStream 从 InputStreamReader 创建 BufferedReader 时,似乎会抛出异常:
import java.io.*;
import java.net.*;
public class Translator {
private final String googleTranslate = "http://translate.google.com/#";
public String translate( String from, String to, String item ) {
String translation = googleTranslate + from + '|' + to + '|' + item;
URL translationURL;
try { translationURL = new URL(translation); }
catch(MalformedURLException e) { return e.getMessage(); }
BufferedReader httpin;
String fullPage = "";
System.out.println(translation);
try {
httpin = new BufferedReader(
new InputStreamReader(translationURL.openStream()));
String line;
while((line=httpin.readLine()) != null) { fullPage += line + '\n'; }
httpin.close();
} catch(IOException e) { return e.getMessage(); }
int begin = fullPage.indexOf("<span class=\"\">");
int end = fullPage.indexOf("</span>");
return fullPage.substring(begin + 15, end);
}
public Translator() {}
}
我正在测试这个Ubuntu Linux 11.04 上的 Eclipse (GALILEO) 中的代码,与 Wubi 一起安装,具有工作且可靠的无线互联网连接。我也尝试过在命令行中运行它,但行为是相同的。 java -version
给了我这个:
java 版本“1.6.0_22” OpenJDK运行环境(IcedTea6 1.10.2)(6b22-1.10.2-0ubuntu1~11.04.1) OpenJDK 64 位服务器虚拟机(构建 20.0-b11,混合模式)
I'm making a Java console application that needs to send an HTTP request to Google Translate to get a translation from the aforementioned site.
My problem, is that I receive a 403 error when I try to read from a valid URL, using openStream()
.
Creating an instance of this Translator class with Translator t = new Translator();
and calling t.translate("en", "ja", "cheese");
, for example, should return the translation the program finds on the page http://translate.google.com/#en|ja|cheese, it seems, but instead it catches an IOException and returns this:
http://translate.google.com/#en|ja|cheese
Server returned HTTP response code: 403 for URL: http://translate.google.com/#en|ja|cheese
A similar error occurs with any other arguments that create a valid Google Translate URL.
A 403 error apparently means I am denied permission. This is what I want to know about. Why can't I access this page, and what must I do in order to access it?
I've visited the site in my web browser, and entered the address that my program tries to access, manually, but it worked; I'm not sure why my program thus cannot access the page? Typing or copy/pasting the address into my FireFox navigation bar works; see, if this is correct, then the site may be wanting me to access the page via links on another page? How might I go about that, if that's what I must do?
Here's the code, as I think it may help.. the exception seems to be thrown when I try to create a BufferedReader from an InputStreamReader from the InputStream returned by translationURL.openStream()
:
import java.io.*;
import java.net.*;
public class Translator {
private final String googleTranslate = "http://translate.google.com/#";
public String translate( String from, String to, String item ) {
String translation = googleTranslate + from + '|' + to + '|' + item;
URL translationURL;
try { translationURL = new URL(translation); }
catch(MalformedURLException e) { return e.getMessage(); }
BufferedReader httpin;
String fullPage = "";
System.out.println(translation);
try {
httpin = new BufferedReader(
new InputStreamReader(translationURL.openStream()));
String line;
while((line=httpin.readLine()) != null) { fullPage += line + '\n'; }
httpin.close();
} catch(IOException e) { return e.getMessage(); }
int begin = fullPage.indexOf("<span class=\"\">");
int end = fullPage.indexOf("</span>");
return fullPage.substring(begin + 15, end);
}
public Translator() {}
}
I am testing this code in Eclipse (GALILEO) on Ubuntu Linux 11.04, installed with Wubi, with a working and reliable wireless Internet connection. I've also tried running it in the command line, but the behavior was the same. java -version
got me this:
java version "1.6.0_22"
OpenJDK Runtime Environment (IcedTea6 1.10.2) (6b22-1.10.2-0ubuntu1~11.04.1)
OpenJDK 64-Bit Server VM (build 20.0-b11, mixed mode)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
他们正在查看用户代理字符串,大概他们不希望人们以编程方式执行此操作。
我确实让你的代码正常工作,但由于 Google 对 API 访问收费,并且他们正在积极阻止非浏览器的内容(基于用户代理字符串),我不会告诉你我是如何做到的。
谷歌搜索在 Java 中设置用户代理字符串将会得到你想要的(事实上我在 Stackoverflow 上找到了答案)。
They are looking at the user agent string, and presumably they don't want people doing this programatically.
I did get your code working working, but since Google charges for the API access and they are actively blocking things that are not browsers (based on the user agent string) I won't tell you how I did it.
A google search for setting the user agent string in Java will get you what you want (as a matter of fact I found the answer here on Stackoverflow).
通过告诉“我是浏览器而不是正在运行的代码”来欺骗“translate.google”。
deceiving "translate.google" by telling that I'm a browser not a running code.
添加引用来请求,例如
Add referer to request like