Active Record 序列化 attr 丢失字符串编码(可能是 YAML 问题),解决方法吗?
我正在使用 Rails 2.3.8 和 Ruby 1.9.1,但遇到了问题 活动记录中的序列化属性不保留字符串编码。 根本问题可能是 yaml,但我想知道是否有人有 关于如何处理这个问题的任何好主意。我正在开发的应用程序有 许多序列化字段,其中一些包含深层结构 数组和哈希。返回一个 ASCII-8Bit 字符串(实际上是 UTF-8)在这些结构深处会造成严重破坏......
也许最好通过示例来说明,如果我将 l 保存到序列化 attr 一个活动记录模型,我将从数据库中读取数据时得到l2。
>> l => ["English", "Türkçe", "Русский"] >> l.map(&:encoding) => [#<Encoding:UTF-8>, #<Encoding:UTF-8>, #<Encoding:UTF-8>] >> l.map(&:valid_encoding?) => [true, true, true] >> l.to_yaml => "--- \n- English\n- !binary |\n VMO8cmvDp2U=\n\n- \"\\xD0\\xA0\\xD1\\x83\\xD1\\x81\\xD1\\x81\\xD0\\xBA\\xD0\\xB8\\xD0\\xB9\"\n" >> l2 = YAML.load(l.to_yaml) => ["English", "T\xC3\xBCrk\xC3\xA7e", "Русский"] >> l2.map(&:encoding) => [#<Encoding:UTF-8>, #<Encoding:ASCII-8BIT>, #<Encoding:UTF-8>]
有谁知道 yaml 如何决定是否将字符串存储为 二进制与转义字符串?上面的最后两个字符串都是 非ascii-7,但只有第一个存储为二进制...
我当前的想法是挂钩活动记录反序列化例程,遍历哈希和数组并对所有字符串元素强制编码。不是非常安全或通用,但可能适合我的用例,尽管我也想知道是否有人修补了 YAML 以使这里变得更聪明......
I'm using Rails 2.3.8 with Ruby 1.9.1 and I'm having a problem with
serialized attributes in active record not preserving string encodings.
The underlying problem is probably yaml, but I'm wondering if anyone has
any good ideas on how to handle this. The app I'm working on has
numerous serialized fields some of which contain deep structures of
arrays and hashes. Getting back an ASCII-8Bit string (that's actually
UTF-8) deep within those structures wrecks havoc later...
Perhaps best illustrated by example, if I save l to a serialized attr in
an active record model I'll get back l2 on reading from the database.
>> l => ["English", "Türkçe", "Русский"] >> l.map(&:encoding) => [#<Encoding:UTF-8>, #<Encoding:UTF-8>, #<Encoding:UTF-8>] >> l.map(&:valid_encoding?) => [true, true, true] >> l.to_yaml => "--- \n- English\n- !binary |\n VMO8cmvDp2U=\n\n- \"\\xD0\\xA0\\xD1\\x83\\xD1\\x81\\xD1\\x81\\xD0\\xBA\\xD0\\xB8\\xD0\\xB9\"\n" >> l2 = YAML.load(l.to_yaml) => ["English", "T\xC3\xBCrk\xC3\xA7e", "Русский"] >> l2.map(&:encoding) => [#<Encoding:UTF-8>, #<Encoding:ASCII-8BIT>, #<Encoding:UTF-8>]
Does anyone know how yaml decides on whether or not to store a string as
binary vs. as an escaped string? Both the last two strings above are
non-ascii-7 but only the first is stored as binary...
My current thinking is to hook the active record deserialization routine, walk hashes and arrays and force encoding on all the string elements. Not terribly safe or general, but would probably work for my use case, though I also wonder if anyone's patched YAML to be smarter here...
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我确实想出了一个解决方案:
猴子修补字符串可以强制 YAML 使用 \ 转义而不是二进制,因此以默认编码(对我来说 UTF-8)而不是 ASCII-8BIT 返回字符串,
最初这个例程使用一些启发式方法更短\对字符串的二进制编码进行转义,这就是为什么我只有一些国际字符串遇到问题的原因。
I did come up with one solution :
monkey patching String can force YAML to use \ escaping rather then binary and therefore return strings in the default encoding (UTF-8 for me) rather then ASCII-8BIT
originally this routine uses some heuristics around which would be shorter \ escaping of binary encoding of the string which is why only some of the international strings I had were having problems.