应用程序引擎 Url 请求 utf-8 字符变为 '??'或'???'

发布于 2024-12-20 09:18:03 字数 1301 浏览 1 评论 0原文

我在将数据从 Web 服务加载到数据存储区时遇到错误。问题是从 Web 服务返回的 XML 包含 UTF-8 字符,而应用程序引擎无法正确解释它们。它将它们呈现为??。

我相当确定我已将其跟踪到 URL 获取请求。基本流程是:任务队列->任务队列获取网络服务数据 ->将数据放入数据存储中,因此它绝对与主站点的请求或响应编码无关。

我将日志消息放在 Apache Digester 之前和之后,看看这是否是原因,但确定不是。这是我在日志中看到的内容:

XML 中的字符串:“Doppelg��nger”

经过消化器处理后:“Doppelg??nger”

这是我的 url 获取代码:

public static String getUrl(String pageUrl) {
    StringBuilder data = new StringBuilder();
    log.info("Requesting: " + pageUrl);
    for(int i = 0; i < 5; i++) {
        try {
            URL url = new URL(pageUrl);
            URLConnection connection = url.openConnection();
            connection.connect();
            BufferedReader reader = new BufferedReader(new InputStreamReader(connection.getInputStream()));
            String line;
            while ((line = reader.readLine()) != null) {
                data.append(line);
            }
            reader.close();
            break;
        } catch (Exception e) {
            log.warn("Failed to load page: " + pageUrl, e);
        }
    }
    String resp = data.toString();
    if(resp.isEmpty()) {
        return null;
    }
    return resp;

有没有办法强制将输入识别为 UTF -8。我测试了正在加载的页面,W3c 验证器将其识别为有效的 utf-8。

该问题仅出现在应用程序引擎服务器上,它在开发服务器中工作正常。

谢谢

I have an error where I am loading data from a web-service into the datastore. The problem is that the XML returned from the web-service has UTF-8 characters and app engine is not interpreting them correctly. It renders them as ??.

I'm fairly sure I've tracked this down to the URL Fetch request. The basic flow is: Task queue -> fetch the web-service data -> put data into datastore so it definitely has nothing to do with request or response encoding of the main site.

I put log messages before and after Apache Digester to see if that was the cause, but determined it was not. This is what I saw in logs:

string from the XML: "Doppelg��nger"

After digester processed: "Doppelg??nger"

Here is my url fetching code:

public static String getUrl(String pageUrl) {
    StringBuilder data = new StringBuilder();
    log.info("Requesting: " + pageUrl);
    for(int i = 0; i < 5; i++) {
        try {
            URL url = new URL(pageUrl);
            URLConnection connection = url.openConnection();
            connection.connect();
            BufferedReader reader = new BufferedReader(new InputStreamReader(connection.getInputStream()));
            String line;
            while ((line = reader.readLine()) != null) {
                data.append(line);
            }
            reader.close();
            break;
        } catch (Exception e) {
            log.warn("Failed to load page: " + pageUrl, e);
        }
    }
    String resp = data.toString();
    if(resp.isEmpty()) {
        return null;
    }
    return resp;

Is there a way I can force this to recognize the input as UTF-8. I tested the page I am loading and the W3c validator recognized it as valid utf-8.

The issue is only on app engine servers, it works fine in the development server.

Thanks

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

通知家属抬走 2024-12-27 09:18:03

尝试

BufferedReader reader = new BufferedReader(new InputStreamReader(connection.getInputStream(), "UTF-8"));

try

BufferedReader reader = new BufferedReader(new InputStreamReader(connection.getInputStream(), "UTF-8"));
迷迭香的记忆 2024-12-27 09:18:03

三个月前,迈克,我也被同样的问题所吸引。它看起来确实像,我认为你的问题是一样的。
让我回忆一下并把它记在这里。如果我遗漏了什么,请随时添加。

我的设置是 Tomcat 和 struts。
我解决这个问题的方法是通过 Tomcat 中的正确配置。
基本上它本身必须支持 UTF-8 字符。连接器中的 useBodyEncodingForURI。这是针对 GET 参数的

,此外您还可以使用 POST 参数的过滤器。
您可以在一个屋顶上找到所有这些内容的一个很好的资源是单击

此后我在生产中遇到了问题,我让 apache web 服务器将请求重定向到 tomcat :)。同样,也必须在那里启用 UTF-8。这个故事的寓意解决了问题的出现:)

I was drawn into the same issue 3 months back Mike. It does look like and I would assume your problems are same.
Let me recollect and put it down here. Feel free to add if I miss something.

My set up was Tomcat and struts.
And the way I resolved it was through correct configs in Tomcat.
Basically it has to support the UTF-8 character there itself. useBodyEncodingForURI in the connector. this is for GET params

Plus you can use a filter for POST params.
A good resource where yu can find all this in one roof is Click here!

I had a problem in the production thereafter where I had apache webserver redirecting request to tomcat :). Similarly have to enable UTF-8 there too. The moral of the story resolve the problem as it comes :)

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文