应用程序引擎 Url 请求 utf-8 字符变为 '??'或'???'
我在将数据从 Web 服务加载到数据存储区时遇到错误。问题是从 Web 服务返回的 XML 包含 UTF-8 字符,而应用程序引擎无法正确解释它们。它将它们呈现为??。
我相当确定我已将其跟踪到 URL 获取请求。基本流程是:任务队列->任务队列获取网络服务数据 ->将数据放入数据存储中,因此它绝对与主站点的请求或响应编码无关。
我将日志消息放在 Apache Digester 之前和之后,看看这是否是原因,但确定不是。这是我在日志中看到的内容:
XML 中的字符串:“Doppelg��nger”
经过消化器处理后:“Doppelg??nger”
这是我的 url 获取代码:
public static String getUrl(String pageUrl) {
StringBuilder data = new StringBuilder();
log.info("Requesting: " + pageUrl);
for(int i = 0; i < 5; i++) {
try {
URL url = new URL(pageUrl);
URLConnection connection = url.openConnection();
connection.connect();
BufferedReader reader = new BufferedReader(new InputStreamReader(connection.getInputStream()));
String line;
while ((line = reader.readLine()) != null) {
data.append(line);
}
reader.close();
break;
} catch (Exception e) {
log.warn("Failed to load page: " + pageUrl, e);
}
}
String resp = data.toString();
if(resp.isEmpty()) {
return null;
}
return resp;
有没有办法强制将输入识别为 UTF -8。我测试了正在加载的页面,W3c 验证器将其识别为有效的 utf-8。
该问题仅出现在应用程序引擎服务器上,它在开发服务器中工作正常。
谢谢
I have an error where I am loading data from a web-service into the datastore. The problem is that the XML returned from the web-service has UTF-8 characters and app engine is not interpreting them correctly. It renders them as ??.
I'm fairly sure I've tracked this down to the URL Fetch request. The basic flow is: Task queue -> fetch the web-service data -> put data into datastore so it definitely has nothing to do with request or response encoding of the main site.
I put log messages before and after Apache Digester to see if that was the cause, but determined it was not. This is what I saw in logs:
string from the XML: "Doppelg��nger"
After digester processed: "Doppelg??nger"
Here is my url fetching code:
public static String getUrl(String pageUrl) {
StringBuilder data = new StringBuilder();
log.info("Requesting: " + pageUrl);
for(int i = 0; i < 5; i++) {
try {
URL url = new URL(pageUrl);
URLConnection connection = url.openConnection();
connection.connect();
BufferedReader reader = new BufferedReader(new InputStreamReader(connection.getInputStream()));
String line;
while ((line = reader.readLine()) != null) {
data.append(line);
}
reader.close();
break;
} catch (Exception e) {
log.warn("Failed to load page: " + pageUrl, e);
}
}
String resp = data.toString();
if(resp.isEmpty()) {
return null;
}
return resp;
Is there a way I can force this to recognize the input as UTF-8. I tested the page I am loading and the W3c validator recognized it as valid utf-8.
The issue is only on app engine servers, it works fine in the development server.
Thanks
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
尝试
try
三个月前,迈克,我也被同样的问题所吸引。它看起来确实像,我认为你的问题是一样的。
让我回忆一下并把它记在这里。如果我遗漏了什么,请随时添加。
我的设置是 Tomcat 和 struts。
我解决这个问题的方法是通过 Tomcat 中的正确配置。
基本上它本身必须支持 UTF-8 字符。连接器中的 useBodyEncodingForURI。这是针对 GET 参数的
,此外您还可以使用 POST 参数的过滤器。
您可以在一个屋顶上找到所有这些内容的一个很好的资源是单击
此后我在生产中遇到了问题,我让 apache web 服务器将请求重定向到 tomcat :)。同样,也必须在那里启用 UTF-8。这个故事的寓意解决了问题的出现:)
I was drawn into the same issue 3 months back Mike. It does look like and I would assume your problems are same.
Let me recollect and put it down here. Feel free to add if I miss something.
My set up was Tomcat and struts.
And the way I resolved it was through correct configs in Tomcat.
Basically it has to support the UTF-8 character there itself. useBodyEncodingForURI in the connector. this is for GET params
Plus you can use a filter for POST params.
A good resource where yu can find all this in one roof is Click here!
I had a problem in the production thereafter where I had apache webserver redirecting request to tomcat :). Similarly have to enable UTF-8 there too. The moral of the story resolve the problem as it comes :)