使用 HttpWebRequest 的 C# 编码问题
从 HttpWebRequest 返回字符串时,我收到的字符代码(' 和 "e;)破坏了我的响应(显示 39; 和 uto;):
internal static void TranslateThis(Player player, string fromLang, string toLang, string text){
try
{
string translated = null;
HttpWebRequest hwr = (HttpWebRequest)HttpWebRequest.Create("http://translate.google.com/?langpair=" + fromLang + "|" + toLang + "&text=" + text.Replace(" ", "+") + "#");
HttpWebResponse res = (HttpWebResponse)hwr.GetResponse();
StreamReader sr = new StreamReader(res.GetResponseStream());
string html = sr.ReadToEnd();
int a = html.IndexOf("onmouseout=\"this.style.backgroundColor='#fff'\">") + 47;
int b = html.IndexOf("</span>",html.IndexOf("onmouseout=\"this.style.backgroundColor='#fff'\">") + 47);
translated = html.Substring(a, b - a);
if (translated.Length < (10 * text.Length)){
if (player == Player.Console)
{
player.ParseMessage(translated, true);
}
else
{
player.ParseMessage(translated, false);
}
} else {
player.Message("Usage: /translate [lang] [message]");
}
}
catch
{
player.Message("Usage: /translate [lang] [message]");
}
}
I am getting character codes (' and &quote;) that are breaking my responses (showing 39; and uto;) when returning a string from an HttpWebRequest:
internal static void TranslateThis(Player player, string fromLang, string toLang, string text){
try
{
string translated = null;
HttpWebRequest hwr = (HttpWebRequest)HttpWebRequest.Create("http://translate.google.com/?langpair=" + fromLang + "|" + toLang + "&text=" + text.Replace(" ", "+") + "#");
HttpWebResponse res = (HttpWebResponse)hwr.GetResponse();
StreamReader sr = new StreamReader(res.GetResponseStream());
string html = sr.ReadToEnd();
int a = html.IndexOf("onmouseout=\"this.style.backgroundColor='#fff'\">") + 47;
int b = html.IndexOf("</span>",html.IndexOf("onmouseout=\"this.style.backgroundColor='#fff'\">") + 47);
translated = html.Substring(a, b - a);
if (translated.Length < (10 * text.Length)){
if (player == Player.Console)
{
player.ParseMessage(translated, true);
}
else
{
player.ParseMessage(translated, false);
}
} else {
player.Message("Usage: /translate [lang] [message]");
}
}
catch
{
player.Message("Usage: /translate [lang] [message]");
}
}
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
首先确保您获得下载内容的正确编码。请参阅此SO答案有关如何执行此操作的代码。
基本上检查 http 标头和元标记的编码,并在必要时重新编码内容。然后执行 HttpUtility.HtmlDecode 来删除任何 html 编码字符。现在您已准备好开始搜索您想要查找的任何内容。
我还建议使用 Html Agility Pack 之类的东西来解析 html 而不是 indexof。
First of all make sure you get the correct encoding of the downloaded content. See this SO answer for code on how to do this.
Basically check both the http headers and the meta tags for the encoding, and re-encode the content if necessary. Then do a HttpUtility.HtmlDecode to get rid of any html coded characters. Now you are ready to start searching for whatever content you are trying to find.
I would also recommend using something like Html Agility Pack to parse the html instead of indexof.
很难说您的
ParseMessage
方法到底期望什么,因此这只是一个猜测:您从 Google Translate 获得的结果是 HTML 格式的。这意味着如果您想要纯文本输出,则必须将 HTML 转换为文本。您已经成功(至少目前为止,直到 Google Translate 稍微更改其输出页面;您的解决方案并不完全可靠或面向未来)从 HTML 页面提取翻译。但翻译仍然采用 HTML 编码,您需要对其进行解码。为此,您可以使用
WebUtility.HtmlDecode
方法(假设您使用的是 .NET Framework 4):在该行后面添加
It is hard to say what exactly does your
ParseMessage
method expect, so this is just a guess:The result you are getting from Google Translate is in HTML. Which means if you want a plain text output, you have to convert the HTML to text. You have successfully (for now, at least, until Google Translate changes their output page a tiny bit; your solution is not exactly fool- or future-proof) extracted the translation from the HTML page. But the translation is still encoded in HTML and you need to decode it. For that, you can use the
WebUtility.HtmlDecode
method (assuming you are using .NET Framework 4): After theline, add
在最后大量评论之前,与另一位开发人员的讨论让我尝试了这个。这是最终的工作结果:
Discussions with another developer go me to try this before the last lot of comments. Here is what ended up working: