如何在 Twitter 更新中处理 ISO-2022-JP(和其他字符集)?
我的应用程序的一部分接受任意文本并将其作为更新发布到 Twitter。一切工作正常,直到发布外国(非 ASCII/UTF7/8 )字符集,然后事情就不再工作了。
例如,如果有人发帖:
に投稿できる
它(在 Visual Studio 调试器中的我的代码中)变成:
=?ISO-2022-JP?B?GyRCJEtFajlGJEckLSRrGyhC?=
谷歌搜索告诉我,这代表(减号?作为分隔符)
=?ISO-2022-JP 是文本编码
?B表示它是base64编码的
?GyRCJEtFajlGJEckLSRrGyhC? 是编码字符串
对于我的一生,我无法弄清楚如何将此字符串以其原始日语字符作为更新发布到 Twitter。就目前情况而言,将“=?ISO-2022-JP?B?GyRCJEtFajlGJEckLSRrGyhC?=”发送到Twitter将导致该内容被发布。我还尝试将字符串分成上面的片段,使用 System.Text.Encoding 将 ISO-2022-JP 转换为 UTF8,反之亦然,base64 解码而不解码。此外,我还尝试了状态更新的 URL 编码,如下所示:
string[] bits = tweetText.Split(new char[] { '?' });
if (bits.Length >= 4)
{
textEncoding = System.Text.Encoding.GetEncoding(bits[1]);
xml = oAuth.oAuthWebRequest(TwitterLibrary.oAuthTwitter.Method.POST, url, "status=" + System.Web.HttpUtility.UrlEncode(decodedText, textEncoding));
}
无论我做什么,结果都永远不会恢复正常。
编辑: 最后得到了。对于那些在家里关注的人来说,它最终与下面列出的答案非常接近。这只是 Visual Studios 调试器给我带来了错误的方向,以及我正在使用的 Twitter 库中的一个错误。最终结果是这样的:
decodedText = textEncoding.GetString(System.Convert.FromBase64String(bits[3]));
byte[] originalBytes = textEncoding.GetBytes(decodedText);
byte[] utfBytes = System.Text.Encoding.Convert(textEncoding, System.Text.Encoding.UTF8, originalBytes);
// now, back to string form
decodedText = System.Text.Encoding.UTF8.GetString(utfBytes);
谢谢大家。
Part of my application accepts arbitrary text and posts it as an Update to Twitter. Everything works fine, until it comes to posting foreign ( non ASCII/UTF7/8 ) character sets, then things no longer work.
For example, if someone posts:
に投稿できる
It ( within my code in Visual Studio debugger ) becomes:
=?ISO-2022-JP?B?GyRCJEtFajlGJEckLSRrGyhC?=
Googling has told me that this represents ( minus ? as delimiters )
=?ISO-2022-JP is the text encoding
?B means it is base64 encoded
?GyRCJEtFajlGJEckLSRrGyhC? Is the encoded string
For the life of me, I can't figure out how to get this string posted as an update to Twitter in it's original Japanese characters. As it stands now, sending '=?ISO-2022-JP?B?GyRCJEtFajlGJEckLSRrGyhC?=' to Twitter will result in exactly that getting posted. Ive also tried breaking the string up into pieces as above, using System.Text.Encoding to convert to UTF8 from ISO-2022-JP and vice versa, base64 decoded and not. Additionally, ive played around with the URL Encoding of the status update like this:
string[] bits = tweetText.Split(new char[] { '?' });
if (bits.Length >= 4)
{
textEncoding = System.Text.Encoding.GetEncoding(bits[1]);
xml = oAuth.oAuthWebRequest(TwitterLibrary.oAuthTwitter.Method.POST, url, "status=" + System.Web.HttpUtility.UrlEncode(decodedText, textEncoding));
}
No matter what I do, the results never end up back to normal.
EDIT:
Got it in the end. For those following at home, it was pretty close to the answer listed below in the end. It was just Visual Studios debugger was steering me the wrong way and a bug in the Twitter Library I was using. End result was this:
decodedText = textEncoding.GetString(System.Convert.FromBase64String(bits[3]));
byte[] originalBytes = textEncoding.GetBytes(decodedText);
byte[] utfBytes = System.Text.Encoding.Convert(textEncoding, System.Text.Encoding.UTF8, originalBytes);
// now, back to string form
decodedText = System.Text.Encoding.UTF8.GetString(utfBytes);
Thanks all.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
这产生了您正在寻找的输出:
标准很棒,有很多可供选择。 ISO 从来不会让人失望,ISO-2022-JP 编码不少于 3 种。如果遇到问题,也可以尝试编码 50221 和 50222。
This produced the output you are looking for:
Standards are great, there are so many to choose from. ISO never disappoints, there are no less than 3 ISO-2022-JP encodings. If you have trouble then also try encodings 50221 and 50222.
您对文本编码方式的理解似乎是正确的。在 python 中
返回正确的 unicode 字符串。请注意,您需要先解码 Base64 才能获取 ISO-2022-JP 编码文本。
Your understanding of how the text is encoded seems correct. In python
returns the correct unicode string. Note that you need to decode base64 first in order to get the ISO-2022-JP-encoded text.