ruby 从任何编码到 ascii
我主要要处理的是英文字母和所有标点符号,我不必担心欧洲口音。因此,我唯一担心的是,当用户粘贴从网络上复制的内容时,例如,当我在控制台(在 Win7 上)中执行 put 操作时,它会输出
“It gammaÇÖs”#,而它实际上在哪里 所以我的主要问题是
,是否有一种可以在 Ruby 中使用的万能转换方法,可以正确替换所有 ,.;?!"'~` _- 与 ASCII 计数器部分?
我真的非常理解关于编码的知识很少,如果您认为这是错误的问题(很可能是这种情况),请建议我应该寻找什么,
谢谢。
I have to deal with mainly English alphabets and all the punctuation marks, I don't have to worry about European accents. So the only concern I have is when a user paste something he copies from the web that includes, for instance, an apostrophe that when I do a puts in the console (on Win7), it outputs
"ItΓÇÖs" # where as it actually is " It's "
So my main question is, is there a end-it-all conversion method I can use in Ruby that just properly replaces all the ,.;?!"'~` _- with ASCII counter parts?
I really understand very little about encodings, if you think this is wrong question to ask, which can very likely be the case, please do advice as to what I should look for instead.
Thank you
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
我在出版业工作,我们经常处理这个问题。我们在 stringex https://github.com/rsl/stringex 方面取得了成功。他们有一个 to_ascii 方法,可以标准化 unicode 破折号等。
I work in publishing where we deal with this a lot. We have had success with stringex https://github.com/rsl/stringex. They have a to_ascii method that normalizes unicode dashes etc.
在 Ruby 2.0 中:
And in ruby 2.0:
对于以编程方式处理多字节编码
iconv
是你的朋友。并且,James Gray 撰写了一系列博客文章,讨论如何分解问题并转换编码。处理已粘贴的文本时,问题会变得更加复杂,因为某些字符可能采用一种多字节编码,而其他字符可能采用另一种多字节编码。您可能必须遍历字符串检查多字节字符,然后询问 Ruby 编码是什么,如果不是您期望的编码,则将其转换为预期或所需的编码,然后移至下一个字符。格雷的文章很好地涵盖了这一切,值得一读。
For programmatically handling multibyte encodings
iconv
is your friend. And, James Grey wrote a series of blog articles talking about how to take apart the problem and convert encodings.The problem gets more complicated when dealing with text that has been pasted in, because some characters could be in one multibyte-encoding, and other characters could be in another. You might have to walk the string checking for multibyte characters, then asking Ruby what the encoding is, and, if it's not what you expect, convert it to the expected or desired encoding, then move to the next character. Grey's articles cover it all nicely and are good reading.