在 Ruby 中输出唯一 unicode 字符的列表

发布于 2025-01-02 16:44:03 字数 298 浏览 0 评论 0原文

我正在用 Ruby 解析一些包含 Unicode 字符的文本,我希望将其转录为一个输出文件中的 ASCII 值和另一个输出文件中的 HTML 编码。有没有一种简单的方法可以输出文件中找到的非 ASCII 字符?例如:

\u00A0 #should become a " " in the text text file, but   in the html output file

我将根据我的需要手动转录它们,并希望输出我需要从初始输入文件转录的唯一字符列表。

谢谢,

I am parsing some text in Ruby that contains Unicode character that I would like to transcribe to ASCII values in one output file and HTML encoding in another. Is there a simple way of spitting out the non-ASCII characters found in a file? For example:

\u00A0 #should become a " " in the text text file, but   in the html output file

I'm going to manually transcribe them based upon my needs and would like to output a list of unique characters I'll need to transcribe from my initial input file.

Thanks,
Ben

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

最终幸福 2025-01-09 16:44:03

有一种方法可以帮助提取字符串中的字符:

"foo\u00A0bar".chars.to_a
# => ["f", "o", "o", " ", "b", "a", "r"]

由于其中一些字符可能是多字节 UNICODE 字符,为了更彻底,您可能还想将其扩展为字节:

"foo\u00A0bar".chars.to_a.collect { |c| [ c, c.bytes.to_a ] }
# => [["f", [102]], ["o", [111]], ["o", [111]], [" ", [194, 160]], ["b", [98]], ["a", [97]], ["r", [114]]]

数组分解了所使用的特定字节来构造那个角色。在本例中,不间断空格显示为 " ",但内部实际上是 [194, 160]

There's a method that helps to extract the characters found in your string:

"foo\u00A0bar".chars.to_a
# => ["f", "o", "o", " ", "b", "a", "r"]

Since some of these characters may be multi-byte UNICODE characters you might want to expand that into bytes as well, to be more thorough:

"foo\u00A0bar".chars.to_a.collect { |c| [ c, c.bytes.to_a ] }
# => [["f", [102]], ["o", [111]], ["o", [111]], [" ", [194, 160]], ["b", [98]], ["a", [97]], ["r", [114]]]

The array breaks down the specific bytes used to construct that character. In this case the non-breaking space shows up as " " but is actually [194, 160] internally.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文