在lua中表示unichar的方式是什么
如果我需要以下 python 值,unicode char '0':
>>> unichr(0)
u'\x00'
如何在 Lua 中定义它?
If I need to have the following python value, unicode char '0':
>>> unichr(0)
u'\x00'
How can I define it in Lua?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
没有一个。
Lua 没有 Unicode 值的概念。 Lua 根本没有 Unicode 的概念。所有Lua字符串都是8位“字符”序列,所有Lua字符串函数都会这样对待它们。 Lua 不将字符串视为具有任何 Unicode 编码;它们只是一个字节序列。
您可以在字符串中插入任意数字。例如:
相当于:
\
表示法后跟 3 位数字(或转义字符之一),该数字必须小于或等于 255。 Lua 完全能够处理嵌入的字符串\000
个字符。但是你不能直接将 Unicode 代码点插入 Lua 字符串。您可以将代码点分解为UTF-8,并使用上述机制将代码点插入到字符串中。例如:
这是
x
字符,后跟 Unicode 结合上面的箭头字符。但由于没有 Lua 函数真正理解 UTF-8,因此您必须公开一些需要 UTF-8 字符串的函数才能使其以任何方式有用。
There isn't one.
Lua has no concept of a Unicode value. Lua has no concept of Unicode at all. All Lua strings are 8-bit sequences of "characters", and all Lua string functions will treat them as such. Lua does not treat strings as having any Unicode encoding; they're just a sequence of bytes.
You can insert an arbitrary number into a string. For example:
Is equivalent to:
The
\
notation is followed by 3 digits (or one of the escape characters), which must be less than or equal to 255. Lua is perfectly capable of handling strings with embedded\000
characters.But you cannot directly insert Unicode codepoints into Lua strings. You can decompose the codepoint into UTF-8 and use the above mechanism to insert the codepoint into a string. For example:
This is the
x
character followed by the Unicode combining above arrow character.But since no Lua functions actually understand UTF-8, you will have to expose some function that expects a UTF-8 string in order for it to be useful in any way.
怎么样
How about
对于更现代的答案,Lua 5.3 现在有
utf8.char
:
For a more modern answer, Lua 5.3 now has the
utf8.char
:虽然原生 Lua 不直接支持或处理 Unicode,但它的字符串实际上是任意字节的缓冲区,按照惯例保存 ASCII 字符。由于字符串可能包含任何字节值,因此在本机字符串之上构建对 Unicode 的支持相对简单。如果字节缓冲区被证明不足以实现这一目的,还可以使用 userdata 对象来保存任何内容,并添加合适的元表,赋予它创建方法,将其转换为所需的数据。编码、串联、迭代以及任何其他需要的东西。
Lua 用户维基上有一个页面,讨论了在 Lua 程序中处理 Unicode 的各种方法。
While native Lua does not directly support or handle Unicode, its strings are really buffers of arbitrary bytes that by convention hold ASCII characters. Since strings may contain any byte values, it is relatively straightforward to build support for Unicode on top of native strings. Should byte buffers prove to be insufficiently robust for the purpose, one can also use a
userdata
object to hold anything, and with the addition of a suitable metatable, endow it with methods for creation, translation to a desired encoding, concatenation, iteration, and anything else that is needed.There is a page at the Lua User's Wiki that discusses various ways to handle Unicode in Lua programs.