Unicode 正则表达式来匹配换行符?
我有这个表单,我想将数据提交到数据库。数据为UTF8。我在匹配换行符时遇到问题。我使用的模式是这样的:
~^[\p{L}\p{M}\p{N} ]+$~u
这种模式工作正常,直到用户在文本框中添加新行。我尝试在类中使用 \p{Z}
但没有成功。我也尝试过“s”,但没有成功。
非常感谢任何帮助。谢谢!
I have this form from where I want to submit data to a database. The data is UTF8. I am having trouble with matching line breaks. The pattern I am using is something like this:
~^[\p{L}\p{M}\p{N} ]+$~u
This pattern works fine until the user puts a new line in his text box. I have tried using \p{Z}
inside the class but with no success. I also tried "s" but it didn’t work.
Any help is much appreciated. Thanks!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
Unicode 换行符要么是回车符后紧跟换行符,要么是具有垂直空白属性的任何字符。
但看起来您正在尝试匹配那里的通用空白。在 Java 中,可以
通过使用范围“仅”这一点来缩短:
同时包含水平空白 (
\h
) 和垂直空白 (\v
),它可能与一般空白 (\s
) 相同,也可能不同。看起来您正在尝试匹配字母数字。
[\pL\pM\p{Nl}]
。\pN
,而是有时只是\p{Nd}
或有时[\p{Nd}\p{Nl }]
。[\pL\pM\p{Nd}\p{Nl}\p{Pc}[\p{InEnheldAlphanumerics}&&\p{So}] ]
— 如果您的正则表达式引擎支持这些类型的操作(Java 支持)。这就是\w
在支持 Unicode 的正则表达式语言(Java 不是其中之一)中的作用。在 Perl 的旧版本中,您可能会编写换行符,
尽管现在更好地编写为
完全
匹配。
Java 在这些事情上非常笨拙。在那里你必须写一个换行符,因为
当写成字符串时,这当然需要额外的 bbaacckkssllasshheess 。
14 个常见字符类正则表达式转义的其他 Java 等价物,以便它们与 Unicode 一起使用,我给出 在此答案中。您可能必须使用其他类似 Java 的正则表达式语言中的那些语言,这些语言不能充分识别 Unicode。
A Unicode linebreak is either a carriage return immediately followed by a line feed, or else it is any character with the vertical whitespace property.
But it looks like you’re trying to match generic whitespace there. In Java, that would be
which can be shortened by using ranges to “only” this:
to include both horizontal whitespace (
\h
) and vertical whitespace (\v
), which may or may not be the same as general whitespace (\s
).It also looks like you’re trying to match alphanumerics.
[\pL\pM\p{Nl}]
.\pN
as often as they are either just\p{Nd}
or else sometimes[\p{Nd}\p{Nl}]
.[\pL\pM\p{Nd}\p{Nl}\p{Pc}[\p{InEnclosedAlphanumerics}&&\p{So}]]
— if your regex engine supports those sorts of operations (Java’s does). That’s what\w
works out to in Unicode-aware regex languages (of which Java is not one).In older versions of Perl, you would likely write a linebreak as
although that’s now better written as
which is exactly what
matches.
Java is very clumsy at these things. There you must write a linebreak as
which of course requires extra bbaacckkssllasshheess when written as a string.
The other Java equivalences for the 14 common character-class regex escapes so that they work with Unicode I give in this answer. You may have to use those in other Java-like regex languages that aren’t sufficiently Unicode-aware.