Twitter Streaming API 使用的官方编码?是UTF-8吗?

发布于 2024-12-18 05:52:01 字数 426 浏览 0 评论 0原文

Twitter 流 API 的官方编码是什么?根据我所看到的,我最好的猜测是 UTF-8,但我想避免做出假设。

我见过的 Twitter 网站上唯一暗示他们使用什么作为官方编码的部分是在这里:

Twitter 不想因为我们使用 UTF-8 或相关 API 客户端使用更长的表示形式而惩罚用户

https://dev.twitter.com/docs/counting-characters< /p>

有人有更“官方”的答案吗?我正在为流 API 编写一个状态机分词器,它做出了某些假设。我最不想遇到的就是UTF-16之类的东西。

谢谢! :D

What is the official encoding for Twitter's streaming API? My best guess is UTF-8 based on what I've seen, but I would like to avoid making assumptions.

The only part of the Twitter site I've seen where they even hint at what they use as their official encoding is here:

Twitter does not want to penalize a user for the fact we use UTF-8 or for the fact that the API client in question used the longer representation

https://dev.twitter.com/docs/counting-characters

Does anyone have a more "official" answer? I'm writing a state-machine tokenizer for the streaming API which makes certain assumptions. The last thing I want is to encounter something like UTF-16.

Thanks! :D

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

梦纸 2024-12-25 05:52:02

目前 twitter API v2 不以 UTF-8 发送数据!

我相信它是 UTF-16,因为在解码 UTF-8 代理对中的数据时仍然存在。代理对仅在 UTF-16 中出现。

例如,通过 API,我收到了以下字符串:

At the moment twitter API v2 does not send their data in UTF-8!

I believe it's UTF-16 and because when decoding data in UTF-8 surrogate pairs remain. Surrogate pairs are only featured in UTF-16.

With the API I received for example this string: ????Crypto Heroez epic giveaway????

However, it didn't come this way but rather: \ud83c\udf81Crypto Heroez epic giveaway\ud83c\udf81

\ud83c\udf81 is a surrogate pair that translates into a gift emoji ????

In Hex code UTF-16BE that wrapped present is encoded with: D8 3C DF 81, in UTF-8 this same emoji is encoded with F0 9F 8E 81

Other developers noticed the same: https://twitterdevfeedback.uservoice.com/forums/930250-twitter-api/suggestions/41152342-utf-8-encoding-of-v2-api-responses

This issue was written on the Aug 15, 2020. But as I am writing today the 9th September 2021, they didn't communicated anything publicly available. (That's why I wanted to post this answer here)

七度光 2024-12-25 05:52:01

一个指标是 Twitter 几乎所有内容都使用的 JSON 格式 规定了(或者至少是默认的)至)UTF-8。他们应该还设置一个适当的HTTP标头来表示编码(但我还没有确认这一点)。如果您使用 XML,则 XML 开始标记显式表示编码,即 UTF-8。

One indicator is that the JSON format, which Twitter uses for virtually everything, dictates (or at least defaults to) UTF-8. They should also set an appropriate HTTP header denoting the encoding (but I haven't confirmed this). If you're using XML instead, the XML opening tag explicitly denotes the encoding, which is UTF-8.

ι不睡觉的鱼゛ 2024-12-25 05:52:01

如果他们说他们使用 UTF-8,那就是一个不错的选择。据我所知,UTF-8 非常常见,而 UTF-16 在野外却很少见。

如果您愿意通过测试它们是否支持各种字符来向自己证明,您还可以使用一些聪明的库。其中最好的方法是 Firefox 在加载网页时使用它们来检测网页的编码: http://www-archive.mozilla.org/projects/intl/UniversalCharsetDetection.html

If they say they use UTF-8, that's a pretty good bet. UTF-8 is very common, and UTF-16 in the wild is pretty rare from what I've seen.

There are also some clever libraries you could use if you were so inclined to prove it to yourself by testing whether they support various characters. The best of these is used by Firefox to detect the encoding of webpages as they're loaded: http://www-archive.mozilla.org/projects/intl/UniversalCharsetDetection.html

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文