如何在 ruby 中删除空格但不删除 utf-8 字符

发布于 2024-10-19 04:01:38 字数 353 浏览 6 评论 0原文

我想阻止用户编写空注释（空格、  等）。所以我应用了以下内容：

var.gsub(/^\s+|\s+\z|\s*&nbsp;\s*/.'')

但是，聪明的用户通过使用 \302 或 \240 unicode 字符发现了一个漏洞，所以我也过滤掉了这些字符。

然后我遇到了问题，因为我引入了多种语言支持，然后像 Déjà vu 这样的词就变成了错误。因为 à 字符的一部分包含 \240。有什么方法可以删除空格但保持拉丁字符不变？

原文

I want to prevent users to write an empty comment (whitespaces, , etc.). so I apply the following:

var.gsub(/^\s+|\s+\z|\s* \s*/.'')

However, then a smart user find a hole by using \302 or \240 unicode characters so I filtered out these characters too.

Then I ran into problem as I introduced several languages support, then a word like Déjà vu becomes an error. because part of the à character contains \240. is there any way to remove the whitespaces but leave the latin characters untouched?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

南汐寒笙箫 2024-10-26 04:01:38

解决这个问题的方法是在使用正则表达式删除空格之前使用 iconv 丢弃无效的 unicode 字符（例如 \230 本身）：

require 'iconv'

var1 = "Déjà vu"
var2 = "\240"

ic = Iconv.new('UTF-8//IGNORE', 'UTF-8')
valid1 = ic.iconv(var1) # => "D\303\251j\303\240 vu" 
valid2 = ic.iconv(var2) # => ""

A way around this is to use iconv to discard the invalid unicode characters (such as \230 on its own) before using your regexp to remove the whitespaces:

require 'iconv'

var1 = "Déjà vu"
var2 = "\240"

ic = Iconv.new('UTF-8//IGNORE', 'UTF-8')
valid1 = ic.iconv(var1) # => "D\303\251j\303\240 vu" 
valid2 = ic.iconv(var2) # => ""

回复收藏 0 原文

~没有更多了~