使用 HttpWebRequest 的 C# 编码问题

发布于 2024-10-18 10:46:53 字数 1338 浏览 2 评论 0原文

从 HttpWebRequest 返回字符串时,我收到的字符代码(' 和 &quote;)破坏了我的响应(显示 39; 和 uto;):

internal static void TranslateThis(Player player, string fromLang, string toLang, string text){
    try
    {
        string translated = null;
        HttpWebRequest hwr = (HttpWebRequest)HttpWebRequest.Create("http://translate.google.com/?langpair=" + fromLang + "|" + toLang + "&text=" + text.Replace(" ", "+") + "#");
        HttpWebResponse res = (HttpWebResponse)hwr.GetResponse();
        StreamReader sr = new StreamReader(res.GetResponseStream());
        string html = sr.ReadToEnd();
        int a = html.IndexOf("onmouseout=\"this.style.backgroundColor='#fff'\">") + 47;
        int b = html.IndexOf("</span>",html.IndexOf("onmouseout=\"this.style.backgroundColor='#fff'\">") + 47);
        translated = html.Substring(a, b - a);
        if (translated.Length < (10 * text.Length)){
            if (player == Player.Console)
            {
                player.ParseMessage(translated, true);
            }
            else
            {
                player.ParseMessage(translated, false);
            }
        } else {
            player.Message("Usage: /translate [lang] [message]");
        }
    }
    catch
    {
        player.Message("Usage: /translate [lang] [message]");
    }
}

I am getting character codes (' and &quote;) that are breaking my responses (showing 39; and uto;) when returning a string from an HttpWebRequest:

internal static void TranslateThis(Player player, string fromLang, string toLang, string text){
    try
    {
        string translated = null;
        HttpWebRequest hwr = (HttpWebRequest)HttpWebRequest.Create("http://translate.google.com/?langpair=" + fromLang + "|" + toLang + "&text=" + text.Replace(" ", "+") + "#");
        HttpWebResponse res = (HttpWebResponse)hwr.GetResponse();
        StreamReader sr = new StreamReader(res.GetResponseStream());
        string html = sr.ReadToEnd();
        int a = html.IndexOf("onmouseout=\"this.style.backgroundColor='#fff'\">") + 47;
        int b = html.IndexOf("</span>",html.IndexOf("onmouseout=\"this.style.backgroundColor='#fff'\">") + 47);
        translated = html.Substring(a, b - a);
        if (translated.Length < (10 * text.Length)){
            if (player == Player.Console)
            {
                player.ParseMessage(translated, true);
            }
            else
            {
                player.ParseMessage(translated, false);
            }
        } else {
            player.Message("Usage: /translate [lang] [message]");
        }
    }
    catch
    {
        player.Message("Usage: /translate [lang] [message]");
    }
}

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

浅忆 2024-10-25 10:46:53

首先确保您获得下载内容的正确编码。请参阅此SO答案有关如何执行此操作的代码。

基本上检查 http 标头和元标记的编码,并在必要时重新编码内容。然后执行 HttpUtility.HtmlDecode 来删除任何 html 编码字符。现在您已准备好开始搜索您想要查找的任何内容。

我还建议使用 Html Agility Pack 之类的东西来解析 html 而不是 indexof。

First of all make sure you get the correct encoding of the downloaded content. See this SO answer for code on how to do this.

Basically check both the http headers and the meta tags for the encoding, and re-encode the content if necessary. Then do a HttpUtility.HtmlDecode to get rid of any html coded characters. Now you are ready to start searching for whatever content you are trying to find.

I would also recommend using something like Html Agility Pack to parse the html instead of indexof.

薄荷梦 2024-10-25 10:46:53

很难说您的 ParseMessage 方法到底期望什么,因此这只是一个猜测:

您从 Google Translate 获得的结果是 HTML 格式的。这意味着如果您想要纯文本输出,则必须将 HTML 转换为文本。您已经成功(至少目前为止,直到 Google Translate 稍微更改其输出页面;您的解决方案并不完全可靠或面向未来)从 HTML 页面提取翻译。但翻译仍然采用 HTML 编码,您需要对其进行解码。为此,您可以使用 WebUtility.HtmlDecode 方法(假设您使用的是 .NET Framework 4):在该

translated = html.Substring(a, b - a);

行后面添加

translated = WebUtility.HtmlDecode(translated);

It is hard to say what exactly does your ParseMessage method expect, so this is just a guess:

The result you are getting from Google Translate is in HTML. Which means if you want a plain text output, you have to convert the HTML to text. You have successfully (for now, at least, until Google Translate changes their output page a tiny bit; your solution is not exactly fool- or future-proof) extracted the translation from the HTML page. But the translation is still encoded in HTML and you need to decode it. For that, you can use the WebUtility.HtmlDecode method (assuming you are using .NET Framework 4): After the

translated = html.Substring(a, b - a);

line, add

translated = WebUtility.HtmlDecode(translated);
绝不放开 2024-10-25 10:46:53

在最后大量评论之前,与另一位开发人员的讨论让我尝试了这个。这是最终的工作结果:

    internal static void TranslateThis(Player player, string fromLang, string toLang, string text){
        try
        {
            string translated = null;
            text = Regex.Replace(text, @"[^\w\.\'\s@-]", "");
            HttpWebRequest request = (HttpWebRequest)WebRequest.Create("http://translate.google.com/?langpair=" + fromLang + "|" + toLang + "&text=" + text.Replace(" ", "+") + "#");

            request.MaximumAutomaticRedirections = 4;
            request.MaximumResponseHeadersLength = 4;

            request.Credentials = CredentialCache.DefaultCredentials;
            HttpWebResponse response = (HttpWebResponse)request.GetResponse();

            Stream receiveStream = response.GetResponseStream();

            StreamReader readStream = new StreamReader(receiveStream, Encoding.UTF7);
            String html = readStream.ReadToEnd() + "";
            int a = html.IndexOf("onmouseout=\"this.style.backgroundColor='#fff'\">") + 47;
            int b = html.IndexOf("</span>",html.IndexOf("onmouseout=\"this.style.backgroundColor='#fff'\">") + 47);
            translated = html.Substring(a, b - a);
            response.Close();
            readStream.Close();
            if (translated.Length < (10 * text.Length))
            {
                translated = translated.Replace("'", "'");
                translated = Regex.Replace(translated, @"[^\w\.\'\s@-]", "");
                if (player == Player.Console)
                {
                    player.ParseMessage(translated, true);
                }
                else
                {
                    player.ParseMessage(translated, false);
                }
            }
            else
            {
                player.Message("Usage: /translate [lang] [message]");
            }
        }
        catch(Exception ex)
        {
            player.Message("Error:" + ex.ToString());

        }
   }

Discussions with another developer go me to try this before the last lot of comments. Here is what ended up working:

    internal static void TranslateThis(Player player, string fromLang, string toLang, string text){
        try
        {
            string translated = null;
            text = Regex.Replace(text, @"[^\w\.\'\s@-]", "");
            HttpWebRequest request = (HttpWebRequest)WebRequest.Create("http://translate.google.com/?langpair=" + fromLang + "|" + toLang + "&text=" + text.Replace(" ", "+") + "#");

            request.MaximumAutomaticRedirections = 4;
            request.MaximumResponseHeadersLength = 4;

            request.Credentials = CredentialCache.DefaultCredentials;
            HttpWebResponse response = (HttpWebResponse)request.GetResponse();

            Stream receiveStream = response.GetResponseStream();

            StreamReader readStream = new StreamReader(receiveStream, Encoding.UTF7);
            String html = readStream.ReadToEnd() + "";
            int a = html.IndexOf("onmouseout=\"this.style.backgroundColor='#fff'\">") + 47;
            int b = html.IndexOf("</span>",html.IndexOf("onmouseout=\"this.style.backgroundColor='#fff'\">") + 47);
            translated = html.Substring(a, b - a);
            response.Close();
            readStream.Close();
            if (translated.Length < (10 * text.Length))
            {
                translated = translated.Replace("'", "'");
                translated = Regex.Replace(translated, @"[^\w\.\'\s@-]", "");
                if (player == Player.Console)
                {
                    player.ParseMessage(translated, true);
                }
                else
                {
                    player.ParseMessage(translated, false);
                }
            }
            else
            {
                player.Message("Usage: /translate [lang] [message]");
            }
        }
        catch(Exception ex)
        {
            player.Message("Error:" + ex.ToString());

        }
   }
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文