使用正则表达式检查特定字符串
我有一个任意长度的字符串类型列表,我需要确保列表中的每个字符串元素都是字母数字或数字,没有空格和特殊字符,例如 - \ / _
等。
可接受的字符串示例include:
J0hn-132ss/sda
Hdka349040r38yd
Hd(ersd)3r4y743-2\d3
123456789
不可接受的字符串示例 include:
Hello
Joe
King
等基本上没有单词。
我目前正在使用 stringInstance.matches("regex")
但不太确定如何编写适当的表达式
if (str.matches("^[a-zA-Z0-9_/-\\|]*$")) return true;
else return false;
对于不匹配的单词,此方法将始终返回 true
符合我说的格式。
我正在寻找的正则表达式的英文描述如下:
任何字符串,其中字符串包含 (a-zA-Z AND 0-9 AND 特殊字符) 中的字符
OR(0-9 AND 特殊字符)
OR (0-9)
编辑:我想出了以下有效的表达式,但我觉得它可能不好,因为它不清楚或太复杂。
表达式:
(([\\pL\\pN\\pP]+[\\pN]+|[\\pN]+[\\pL\\pN\\pP]+)|([\\pN]+[\\pP]*)|([\\pN]+))+
我使用这个网站来帮助我: http ://xenon.stanford.edu/~xusch/regexp/analyzer.html
请注意,我对正则表达式还是新手
I have a list of arbitrary length of Type String, I need to ensure each String element in the list is alphanumerical or numerical with no spaces and special characters such as - \ / _
etc.
Example of accepted strings include:
J0hn-132ss/sda
Hdka349040r38yd
Hd(ersd)3r4y743-2\d3
123456789
Examples of unacceptable strings include:
Hello
Joe
King
etc basically no words.
I’m currently using stringInstance.matches("regex")
but not too sure on how to write the appropriate expression
if (str.matches("^[a-zA-Z0-9_/-\\|]*$")) return true;
else return false;
This method will always return true
for words that don't conform to the format I mentioned.
A description of the regex I’m looking for in English would be something like:
Any String, where the String contains characters from (a-zA-Z AND 0-9 AND special characters)
OR (0-9 AND Special characters)
OR (0-9)
Edit: I have come up with the following expression which works but I feel that it may be bad in terms of it being unclear or to complex.
The expression:
(([\\pL\\pN\\pP]+[\\pN]+|[\\pN]+[\\pL\\pN\\pP]+)|([\\pN]+[\\pP]*)|([\\pN]+))+
I've used this website to help me: http://xenon.stanford.edu/~xusch/regexp/analyzer.html
Note that I’m still new to regex
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
警告:“从不”写入 AZ
RFC 定义之外出现的
AZ
或0-9
等范围的所有实例实际上总是 Unicode 本身就是错误的。特别是,像[A-Za-z]
这样的东西是可怕的反模式:它们肯定会暴露出程序员对文本有一种穴居人心态,这几乎完全不适合这个千年的一边。 Unicode 模式适用于 ASCII,但 ASCII 模式会破坏 Uniocode,有时会导致安全违规。无论您使用的是 20 世纪 70 年代的数据还是 20 世纪 70 年代的数据,始终编写该模式的 Unicode 版本。现代 Unicode,因为这样当您实际使用真正的 Java 字符数据时就不会搞砸。这就像你使用转向灯的方式,即使你“知道”身后没有人,因为如果你错了,你不会造成伤害,而反之,你肯定会造成伤害。习惯使用 7 个 Unicode 类别:\pL
表示字母。请注意\pL
的输入长度比[A-Za-z]
要短得多。\pN
表示数字。\pM
用于与其他代码点组合的标记。\pS
用于符号、标志和印记。 :)\pP
用于标点符号。\pZ
用于分隔符,如空格(但不是控制字符)\pC
用于其他不可见的格式和控制字符,包括未分配的代码点。解决方案
如果您只想要一个模式,
那么尽管在 Java 7 中您可以这样做:
假设您不介意下划线和具有任意组合标记的字母。否则你必须写出非常尴尬的内容:
(?U)
是 Java 7 中的新增内容。它对应于 Pattern 类的UNICODE_CHARACTER_CLASSES
编译标志。它切换 POSIX 字符类(如[:alpha:]
)和简单的快捷方式(如\w
)以实际使用完整的 Java 字符集。通常,它们仅适用于 1970sish ASCII 集,这可能是一个安全漏洞。没有办法让 Java 7 总是在没有被告知的情况下使用其模式执行此操作,但您可以编写一个前端函数来为您执行此操作。您只需要记住给您的电话打电话即可。
请注意,v1.7 之前的 Java 模式无法按照 UTS#18 on 的方式工作Unicode 正则表达式说他们必须这样做。因此,如果您不使用新的 Unicode 标志,您就会面临各种错误、不愉快和悖论。例如,简单且常见的模式
\b\w+\b
将不会被发现与字符串"élève"
中的任何位置匹配,更不用说整个匹配了。因此,如果您在 1.7 之前的 Java 中使用模式,您需要非常小心,比任何人都要小心得多。您不能使用任何 POSIX 字符类或字符类快捷方式,包括
\w
、\s
和\b
,所有这些都会在任何情况下中断而是石器时代的 ASCII 数据。它们不能用于 Java 的本机字符集。在 Java 7 中,它们可以——但前提是使用正确的标志。
WARNING: “Never” Write A-Z
All instances of ranges like
A-Z
or0-9
that occur outside an RFC definition are virtually always ipso facto wrong in Unicode. In particular, things like[A-Za-z]
are horrible antipatterns: they’re sure giveaways that the programmer has a caveman mentality about text that is almost wholly inappropriate this side of the Millennium. The Unicode patterns work on ASCII, but the ASCII patterns break on Uniocode, sometimes in ways that leave you open to security violations. Always write the Unicode version of the pattern no matter whether you are using 1970s data or modern Unicode, because that way you won’t screw up when you actually use real Java character data. It’s like the way you use your turn signal even when you “know” there is no one behind you, because if you’re wrong, you do no harm, whereas the other way, you very most certainly do. Get used to using the 7 Unicode categories:\pL
for Letters. Notice how\pL
is a lot shorter to type than[A-Za-z]
.\pN
for Numbers.\pM
for Marks that combine with other code points.\pS
for Symbols, Signs, and Sigils. :)\pP
for Punctuation.\pZ
for Separators like spaces (but not control characters)\pC
for other invisible formatting and Control characters, including unassigned code points.Solution
If you just want a pattern, you want
although in Java 7 you can do this:
assuming you don’t mind underscores and letters with arbitrary combining marks. Otherwise you have to write the very awkward:
The
(?U)
is new to Java 7. It corresponds to the Pattern class’sUNICODE_CHARACTER_CLASSES
compilation flag. It switches the POSIX character classes like[:alpha:]
and the simple shortcuts like\w
to actually work with the full Java character set. Normally, they work only on the 1970sish ASCII set, which can be a security hole.There is no way to make Java 7 always do this with its patterns without being told to, but you can write a frontend function that does this for you. You just have to remember to call yours instead.
Note that patterns in Java before v1.7 cannot be made to work according to the way UTS#18 on Unicode Regular Expressions says they must. Because of this, you leave yourself open to a wide range of bugs, infelicities, and paradoxes if you do not use the new Unicode flag. For example, the trivial and common pattern
\b\w+\b
will not be found to match anywhere at all within the string"élève"
, let alone in its entirety.Therefore, if you are using patterns in pre-1.7 Java, you need to be extremely careful, far more careful than anyone ever is. You cannot use any of the POSIX charclasses or charclass shortcuts, including
\w
,\s
, and\b
, all of which break on anything but stone-age ASCII data. They cannot be used on Java’s native character set.In Java 7, they can — but only with the right flag.
可以将所需正则表达式的描述重新编写为“包含至少一个数字”,因此以下内容将起作用
/.*[\pN].*/
。或者,如果您想将搜索限制为字母、数字和标点符号,则应使用/[\pL\pN\pP]*[\pN][\pL\pN\pP]*/
。我已经在你的例子上进行了测试,效果很好。您可以使用类似于
/.*?[\pN].*?/
的惰性量词来进一步优化您的正则表达式。这样,如果没有数字,它会失败得更快。我想向您推荐一本关于正则表达式的好书: 掌握正则表达式,它有很好的介绍,深入解释了正则表达式如何工作以及关于java中的正则表达式的一章。
It is possible to refrase the description of needed regex to "contains at least one number" so the followind would work
/.*[\pN].*/
. Or, if you would like to limit your search to letters numbers and punctuation you shoud use/[\pL\pN\pP]*[\pN][\pL\pN\pP]*/
. I've tested it on your examples and it works fine.You can further refine your regexp by using lazy quantifiers like this
/.*?[\pN].*?/
. This way it would fail faster if there are no numbers.I would like to recomend you a great book on regular expressions: Mastering regular expressions, it has a great introduction, in depth explanation of how regular expressions work and a chapter on regular expressions in java.
看起来您只是想确保字符串中没有空格。如果是这样,你可以非常简单地这样做:
如果没有空格,则返回 true (根据我对你的规则的理解有效),如果字符串中任何地方有空格,则返回 false (无效)。
It looks like you just want to make sure that there are no spaces in the string. If so, you can this very simply:
This will return true if there are no spaces (valid by my understanding of your rules), and false if there is a space anywhere in the string (invalid).
这是部分答案,它执行 0-9 和特殊字符 OR 0-9。
^([\d]+|[\\/\-_]*)*$
这可以读作 ((1 个或多个数字) OR (0 个或多个特殊字符
\
/
-
'_')) 0 次或多次。它需要一个数字,仅接受数字,并且拒绝仅包含特殊字符的字符串。我使用 正则表达式测试器 来测试几个字符串。
添加字母字符似乎很容易,但可能需要重复给定的正则表达式。
Here is a partial answer, which does 0-9 and special characters OR 0-9.
^([\d]+|[\\/\-_]*)*$
This can be read as ((1 or more digits) OR (0 or more special char
\
/
-
'_')) 0 or more times. It requires a digit, will take digits only, and will reject strings consisting of only special characters.I used regex tester to test several of the strings.
Adding alphabetic characters seems easy, but a repetition of the given regexp may be required.