Lua、XML、UTF-8
我使用 luaxml lib 在数据库从 lua 表中选择后生成 xml 文件。一切都很好,但我在数据库(NySQL)中使用俄语符号。我需要用 luaxml 来表示这个符号,而不是用代码(a-la Ð),而是用真实的符号。 我找到了方法函数 xml.registerCode(decoded,encoded) 但什么也不明白:(
或者,也许我需要使用另一个库。如果是这样 - 什么库?
I'm using luaxml lib for generating xml files after database selects from lua tables. All is good, but I am using Russian symbols in my DB (NySQL). What I need to do with luaxml to represents this symbols not with codes (a-la Ð) but with a real symbols.
I found method function xml.registerCode(decoded,encoded) but don't understand nothing :(
Or, maybe, I need to use another lib. And if so - what lib?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
我查看了该库的内部 - 它对所有> 127字节进行强制编码,从而将UTF分解为单独的字符。它在使用其内置的
.registerCode
机制后执行此操作,因此您甚至无法覆盖它。如果您需要对某些复杂的数据结构进行编码,则可以在 XmlLua 完成字符串化后通过在某处声明:
然后在最终字符串上使用 gsub 来展开所有这些实体替换:
I've looked inside the lib - it does forceful encoding for all >127 bytes, thus breaking UTF into separate characters. It does it after using its built-in
.registerCode
mechanism, so you can't even override it.If you need to encode some complex data structure, you can just unroll all those entity substitutions after
XmlLua
finished stringifying by declaring somewhere:and then using
gsub
on final string:查看 LuaXML_lib.c 内部,有一个名为 char2code() 的方法,它将不在 ASCII 范围内的字符替换为数字实体。您可以通过将方法替换为以下内容来“恢复”它:
这会阻止它用实体替换任何无效字符。然后您需要确保输入中没有无效字符,但它绝对不会再破坏您的 UTF-8。
Looking inside LuaXML_lib.c, there is a method called char2code() which replaces characters not in the ASCII range with numeric entities. You can "unbreak" it by replacing the method with the following:
This stops it from replacing any invalid characters with entities. It will then be up to you to make sure there are no invalid characters in your input, but it definitely won't mangle your UTF-8 any more.