Ruby - 如何将二进制字符串解包为普通字符串？

发布于 2024-09-06 11:31:05 字数 573 浏览 6 评论 0原文

我正在打开一个 CSV 文件并使用 File.open(filename) 从中读取值。

所以我做了这样的事情：

my_file = File.open(filename)
my_file.each_line do |line|
 line_array = line.split("\t")
 ratio = line_array[1]
 puts "#{ratio}"
 puts ratio.isutf8?
end

我遇到的问题是 line_array 中的值似乎采用奇怪的格式。例如，CSV 文件的单元格中的值之一是 0.86。当我打印出来时，它看起来像“ 0 . 8 6”，

所以它的行为有点像字符串，但我不确定它是如何编码的。当我尝试进行一些反省时：

ratio.isutf8?
I get this:
=> undefined method 'isutf8?' for "\0000\000.\0008\0006\000":String

到底发生了什么？如何将ratio转换为普通字符串，然后可以调用ratio.to_f？

谢谢。

原文

I'm opening a CSV file and reading values from it using File.open(filename).

So I do something like this:

my_file = File.open(filename)
my_file.each_line do |line|
 line_array = line.split("\t")
 ratio = line_array[1]
 puts "#{ratio}"
 puts ratio.isutf8?
end

The issue I'm having is the values in line_array seem to be in a strange format. For example one of the values in a cell of the CSV file is 0.86. When I print it out it looks like " 0 . 8 6"

So it kind of behaves like a string but I'm not sure how it's encoded. When I try to do some introspection:

ratio.isutf8?
I get this:
=> undefined method 'isutf8?' for "\0000\000.\0008\0006\000":String

What the heck is going on?! How do I get ratio into a normal string that I can then call ratio.to_f on?

Thanks.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

子栖 2024-09-13 11:31:05

解包二进制字符串通常称为解码。看起来您的数据采用 UTF-16 格式，但在假设这是真的之前，您应该找到它实际使用的编码（例如，通过调查生成它的工作流程/配置）。

在 Ruby 1.9 中（即时解码）：

my_file = File.open(filename).set_encoding('UTF-16BE:UTF-8')
# the rest as in the original

在 Ruby 1.8 中（读取整个文件，然后解码并解析它；可能不适用于超大文件）：

require 'iconv'

# …

my_file = File.open(filename)
my_text = Iconv.conv('UTF-8', 'UTF-16BE', my_file.read)
my_text.each_line do |line|
 # the rest as in the original
end

Unpacking a binary string is generally called decoding. It looks like your data is in UTF-16 but should should find you what encoding it is actually using (e.g. by investigating the workflow/configuration that produced it) before assuming this is true.

In Ruby 1.9 (decode on the fly):

my_file = File.open(filename).set_encoding('UTF-16BE:UTF-8')
# the rest as in the original

In Ruby 1.8 (read in whole file, then decode and parse it; may not work for super large files):

require 'iconv'

# …

my_file = File.open(filename)
my_text = Iconv.conv('UTF-8', 'UTF-16BE', my_file.read)
my_text.each_line do |line|
 # the rest as in the original
end

回复收藏 0 原文

橘味果▽酱 2024-09-13 11:31:05

看起来您的输入数据被编码为 UTF-16 或 UCS-2。

尝试这样的事情：

require 'iconv'

ratio = Iconv.conv('UTF-8', 'UTF-16', line_array[1])
puts "Ratio is now '#{ratio}'."

想一想，您可能应该在调用 split 之前在整行上运行 Iconv.conv ，否则字符串末尾会出现杂散零字节（除非您将分隔符更改为 '\ 000\t'，看起来相当难看。）

Looks like your input data is encoded as UTF-16 or UCS-2.

Try something like this:

require 'iconv'

ratio = Iconv.conv('UTF-8', 'UTF-16', line_array[1])
puts "Ratio is now '#{ratio}'."

Come to think of it, you should probably run Iconv.conv on the whole line before calling split on it, otherwise there will be stray zero bytes at the end of the strings (unless you change your delimiter to '\000\t', which looks rather ugly.)

回复收藏 0 原文

~没有更多了~