何时使用 Unicode(除了非 unicode!)

发布于 2024-12-11 17:09:08 字数 547 浏览 0 评论 0原文

我还没有找到太多关于何时使用 Unicode 的(简明)信息。我知道很多人说最佳实践是始终使用 Unicode。但 Unicode 字符串确实有更多的内存占用。 ,在将某些内容打印到屏幕上时,必须必须使用 Unicode

  • 我是否正确地说,除了本地(例如调试)使用之外
  • ?一般来说,通过网络发送任何类型的文本,两端位于不同的区域/国家/地区
  • 当您不确定使用哪一个时,

我认为如果有人解释基础知识(简洁),将会很有帮助Unicode 实际发生的情况...我是否正确地说,当以下情况时事情会变得混乱:

  • 物理(字节)字符串使用字符串的表示形式(代码页,其他...这已经很详细了,尽管很有趣) 与发件人不同。

上下文是在编程语言(例如 C++)中使用 Unicode,但我希望这个问题的答案可以用于任何编码情况。
另外,我知道 Unicode 和 NLS 不是同一件事,但是说 NLS 意味着使用 Unicode 是否正确?

PS 很棒的网站

I haven't found much (concise) info about when exactly to use Unicode. I understand that many say best practice is to always use Unicode. But Unicode strings DO have more memory footprint. Am I correct to say that Unicode must be used only when

  • Printing something to screen other than local (for example debugging) use.
  • Generally, sending any type of text across a network with the two ends being in different locales/country
  • When you're not sure which to use

I think it would be beneficial if someone explained the basics (concise) of what actually happens with Unicode... am I correct to say that things get messy when :

  • the physical (byte) string gets sent to a machine using a representation of strings (code page, others... this is already detail although interesting) different from the sender.

The context is using Unicode in a programming language (say C++), but I hope answers to this question can be used for any encoding situation.
Also, I'm aware Unicode and NLS are not the same thing, but is it correct to say that NLS implies usage of Unicode?

P.S. awesome site

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

清泪尽 2024-12-18 17:09:08

始终使用Unicode,它会为您和其他人省去很多痛苦。

您可能感到困惑的是编码问题。 Unicode 字符串不一定比等效的 ASCII(或其他编码)字符串占用更多内存,这在很大程度上取决于所使用的编码。

有时“Unicode”被用作“UCS-2”或“UTF-16<的同义词/a>”。严格来说,这种使用是错误,因为“Unicode”是定义字符集及其 unicode 代码点的标准。它本身定义了到字节(或字)的映射。 UTF-16、UTF-8 和其他编码接管将字符映射到具体字节的工作。

Always use Unicode, it will save you and others a lot of pain.

What you may have confused is the issue of encoding. Unicode strings do not necessarily take more memory than the equivalent ASCII (or other encoding) strings, that depends a lot on the encoding used.

Sometimes "Unicode" is used as a synonym for "UCS-2" or "UTF-16". Strictly speaking that use is wrong, because "Unicode" is the standard that defines the set of characters and their unicode codepoints. It does not as such define a mapping to bytes (or words). UTF-16, UTF-8 and other encoding take over the job of mapping the characters to concrete bytes.

逆夏时光 2024-12-18 17:09:08

Unicode 的美妙之处在于它使您摆脱了限制和许多令人头疼的问题。 Unicode 是迄今为止可用的最大字符集,即它使您能够实际编码和使用当今使用的任何半主流语言的几乎任何字符。对于任何其他字符集,您需要考虑它是否实际上可以对字符进行编码。 Latin-1无法编码字符“あ”,Shift-JIS无法编码字符“带”等等。只有当您非常确定除了基本的拉丁语/阿拉伯语/日语/任何其他字符子集之外,您永远不需要任何其他字符时,您才应该选择专门的编码,例如 Latin-1、BIG-5、Shift-JIS 或 ASCII。

Unicode 是可用的最通用的字符集,因此是一个需要遵守的良好标准。

Unicode 编码没有什么特别的,它们只是在位表示方面稍微复杂一些,因为它们必须编码更多的字符同时仍努力提高空间效率。有关此主题的详细介绍,请参阅每个程序员绝对需要了解的有关编码和字符集的知识文本

The beauty of Unicode is that it frees you from restrictions and lots of headaches. Unicode is the largest character set available to date, i.e. it enables you to actually encode and use virtually any character of any halfway mainstream language in use today. With any other character set you need to think about whether it can actually encode a character or not. Latin-1 cannot encode the character "あ", Shift-JIS cannot encode the character "ڥ" and so on. Only if you're very sure you will never ever need anything other than basic Latin/Arabic/Japanaese/whatever other subset of characters should you choose a specialized encoding such as Latin-1, BIG-5, Shift-JIS or ASCII.

Unicode is the most versatile charset available and therefore a good standard to adhere to.

The Unicode encodings are nothing special, they're just a little more complex in their bit representation since they have to encode many more characters while still trying to be space efficient. For a very detailed excursion into this topic, please see What Every Programmer Absolutely, Positively Needs To Know About Encodings And Character Sets To Work With Text.

夜光 2024-12-18 17:09:08

我有一个小实用程序,有时有助于查看字符编码之间的差异。 http://sodved.awardspace.info/unicode.pl。如果将 ö 粘贴到 Raw (UTF-8) 字段中,您将看到它由不同编码中的不同字节序列表示。正如其他两个好的答案所描述的,一些非 unicode 编码根本无法表示它。

I have a little utility which is sometimes helpful in seeing the difference between character encodings. http://sodved.awardspace.info/unicode.pl. If you paste in ö into the Raw (UTF-8) field you will see that it is represented by different byte sequences in different encodings. And as the other two good answers describe, some non-unicode encodings cannot represent it at all.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文