在lua中表示unichar的方式是什么

发布于 2024-12-09 22:07:15 字数 120 浏览 0 评论 0原文

如果我需要以下 python 值,unicode char '0':

>>> unichr(0)
u'\x00'

如何在 Lua 中定义它?

If I need to have the following python value, unicode char '0':

>>> unichr(0)
u'\x00'

How can I define it in Lua?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

夏末的微笑 2024-12-16 22:07:15

没有一个。

Lua 没有 Unicode 值的概念。 Lua 根本没有 Unicode 的概念。所有Lua字符串都是8位“字符”序列,所有Lua字符串函数都会这样对待它们。 Lua 不将字符串视为具有任何 Unicode 编码;它们只是一个字节序列。

您可以在字符串中插入任意数字。例如:

"\065\066"

相当于:

"AB"

\ 表示法后跟 3 位数字(或转义字符之一),该数字必须小于或等于 255。 Lua 完全能够处理嵌入的字符串\000 个字符。

但是你不能直接将 Unicode 代码点插入 Lua 字符串。您可以将代码点分解为UTF-8,并使用上述机制将代码点插入到字符串中。例如:

"x\226\131\151"

这是 x 字符,后跟 Unicode 结合上面的箭头字符

但由于没有 Lua 函数真正理解 UTF-8,因此您必须公开一些需要 UTF-8 字符串的函数才能使其以任何方式有用。

There isn't one.

Lua has no concept of a Unicode value. Lua has no concept of Unicode at all. All Lua strings are 8-bit sequences of "characters", and all Lua string functions will treat them as such. Lua does not treat strings as having any Unicode encoding; they're just a sequence of bytes.

You can insert an arbitrary number into a string. For example:

"\065\066"

Is equivalent to:

"AB"

The \ notation is followed by 3 digits (or one of the escape characters), which must be less than or equal to 255. Lua is perfectly capable of handling strings with embedded \000 characters.

But you cannot directly insert Unicode codepoints into Lua strings. You can decompose the codepoint into UTF-8 and use the above mechanism to insert the codepoint into a string. For example:

"x\226\131\151"

This is the x character followed by the Unicode combining above arrow character.

But since no Lua functions actually understand UTF-8, you will have to expose some function that expects a UTF-8 string in order for it to be useful in any way.

伪心 2024-12-16 22:07:15

怎么样

function unichr(ord)
    if ord == nil then return nil end
    if ord < 32 then return string.format('\\x%02x', ord) end
    if ord < 126 then return string.char(ord) end
    if ord < 65539 then return string.format("\\u%04x", ord) end
    if ord < 1114111 then return string.format("\\u%08x", ord) end
end

How about

function unichr(ord)
    if ord == nil then return nil end
    if ord < 32 then return string.format('\\x%02x', ord) end
    if ord < 126 then return string.char(ord) end
    if ord < 65539 then return string.format("\\u%04x", ord) end
    if ord < 1114111 then return string.format("\\u%08x", ord) end
end
信仰 2024-12-16 22:07:15

对于更现代的答案,Lua 5.3 现在有 utf8.char

接收零个或多个整数,将每个整数转换为其相应的 UTF-8 字节序列,并返回所有这些序列串联而成的字符串。

For a more modern answer, Lua 5.3 now has the utf8.char:

Receives zero or more integers, converts each one to its corresponding UTF-8 byte sequence and returns a string with the concatenation of all these sequences.

無心 2024-12-16 22:07:15

虽然原生 Lua 不直接支持或处理 Unicode,但它的字符串实际上是任意字节的缓冲区,按照惯例保存 ASCII 字符。由于字符串可能包含任何字节值,因此在本机字符串之上构建对 Unicode 的支持相对简单。如果字节缓冲区被证明不足以实现这一目的,还可以使用 userdata 对象来保存任何内容,并添加合适的元表,赋予它创建方法,将其转换为所需的数据。编码、串联、迭代以及任何其他需要的东西。

Lua 用户维基上有一个页面,讨论了在 Lua 程序中处理 Unicode 的各种方法。

While native Lua does not directly support or handle Unicode, its strings are really buffers of arbitrary bytes that by convention hold ASCII characters. Since strings may contain any byte values, it is relatively straightforward to build support for Unicode on top of native strings. Should byte buffers prove to be insufficiently robust for the purpose, one can also use a userdata object to hold anything, and with the addition of a suitable metatable, endow it with methods for creation, translation to a desired encoding, concatenation, iteration, and anything else that is needed.

There is a page at the Lua User's Wiki that discusses various ways to handle Unicode in Lua programs.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文