Ruby - 如何将二进制字符串解包为普通字符串?
我正在打开一个 CSV 文件并使用 File.open(filename) 从中读取值。
所以我做了这样的事情:
my_file = File.open(filename)
my_file.each_line do |line|
line_array = line.split("\t")
ratio = line_array[1]
puts "#{ratio}"
puts ratio.isutf8?
end
我遇到的问题是 line_array 中的值似乎采用奇怪的格式。例如,CSV 文件的单元格中的值之一是 0.86。当我打印出来时,它看起来像“ 0 . 8 6”,
所以它的行为有点像字符串,但我不确定它是如何编码的。当我尝试进行一些反省时:
ratio.isutf8?
I get this:
=> undefined method 'isutf8?' for "\0000\000.\0008\0006\000":String
到底发生了什么?如何将ratio转换为普通字符串,然后可以调用ratio.to_f?
谢谢。
I'm opening a CSV file and reading values from it using File.open(filename).
So I do something like this:
my_file = File.open(filename)
my_file.each_line do |line|
line_array = line.split("\t")
ratio = line_array[1]
puts "#{ratio}"
puts ratio.isutf8?
end
The issue I'm having is the values in line_array seem to be in a strange format. For example one of the values in a cell of the CSV file is 0.86. When I print it out it looks like " 0 . 8 6"
So it kind of behaves like a string but I'm not sure how it's encoded. When I try to do some introspection:
ratio.isutf8?
I get this:
=> undefined method 'isutf8?' for "\0000\000.\0008\0006\000":String
What the heck is going on?! How do I get ratio into a normal string that I can then call ratio.to_f on?
Thanks.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
解包二进制字符串通常称为解码。看起来您的数据采用 UTF-16 格式,但在假设这是真的之前,您应该找到它实际使用的编码(例如,通过调查生成它的工作流程/配置)。
在 Ruby 1.9 中(即时解码):
在 Ruby 1.8 中(读取整个文件,然后解码并解析它;可能不适用于超大文件):
Unpacking a binary string is generally called decoding. It looks like your data is in UTF-16 but should should find you what encoding it is actually using (e.g. by investigating the workflow/configuration that produced it) before assuming this is true.
In Ruby 1.9 (decode on the fly):
In Ruby 1.8 (read in whole file, then decode and parse it; may not work for super large files):
看起来您的输入数据被编码为 UTF-16 或 UCS-2。
尝试这样的事情:
想一想,您可能应该在调用 split 之前在整行上运行 Iconv.conv ,否则字符串末尾会出现杂散零字节(除非您将分隔符更改为 '\ 000\t',看起来相当难看。)
Looks like your input data is encoded as UTF-16 or UCS-2.
Try something like this:
Come to think of it, you should probably run Iconv.conv on the whole line before calling split on it, otherwise there will be stray zero bytes at the end of the strings (unless you change your delimiter to '\000\t', which looks rather ugly.)