Java - 未知字符作为 [a-zA-z0-9]* 传递?
我不是正则表达式专家,但我需要解析一些我无法控制的输入,并确保过滤掉任何没有 Az 和/或 0-9 的字符串。
当我运行这个时,
Pattern p = Pattern.compile("^[a-zA-Z0-9]*$"); //fixed typo
if(!p.matcher(gottenData).matches())
System.out.println(someData); //someData contains gottenData
某些空格+未知符号以某种方式通过过滤器(gottenData是红色矩形):
如果您想知道,它也显示文本,但并非都是这样。
现在,我不介意 [?] 只要它还包含一些字符串即可。
请帮忙。
[编辑] 据我从(非常大的)输入中可以看出, [?] 要么是空格,要么什么都没有;也许存在某种编码问题,也可能与 #text 节点有关(输入是 xml)
I'm no expert in regex but I need to parse some input I have no control over, and make sure I filter away any strings that don't have A-z and/or 0-9.
When I run this,
Pattern p = Pattern.compile("^[a-zA-Z0-9]*$"); //fixed typo
if(!p.matcher(gottenData).matches())
System.out.println(someData); //someData contains gottenData
certain spaces + an unknown symbol somehow slip through the filter (gottenData is the red rectangle):
In case you're wondering, it DOES also display Text, it's not all like that.
For now, I don't mind the [?] as long as it also contains some string along with it.
Please help.
[EDIT] as far as I can tell from the (very large) input, the [?]'s are either white spaces either nothing at all; maybe there's some sort of encoding issue, also perhaps something to do with #text nodes (input is xml)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(7)
* 量词匹配“零个或多个”,这意味着它将匹配不包含类中任何字符的字符串。尝试使用 + 量词,这意味着“一个或多个”:
^[a-zA-Z0-9]+$
将匹配仅由字母数字字符组成的字符串。^.*[a-zA-Z0-9]+.*$
将匹配任何包含一个或多个字母数字字符的字符串,尽管前导 .* 会使速度慢得多。如果您使用Matcher.lookingAt()
而不是Matcher.matches
,则不需要完整的字符串匹配,您可以使用正则表达式[a-zA- Z0-9]+
。The * quantifier matches "zero or more", which means it will match a string that does not contain any of the characters in your class. Try the + quantifier, which means "One or more":
^[a-zA-Z0-9]+$
will match strings made up of alphanumeric characters only.^.*[a-zA-Z0-9]+.*$
will match any string containing one or more alphanumeric characters, although the leading .* will make it much slower. If you useMatcher.lookingAt()
instead ofMatcher.matches
, it will not require a full string match and you can use the regex[a-zA-Z0-9]+
.您的正则表达式中有错误:应为
[a-zA-Z0-9]*
,而不是[a-zA-z0-9]*
。您不需要在正则表达式周围使用
^
和$
。Matcher.matches()
始终匹配完整的字符串。这会打印
“不匹配。”
You have an error in your regex: instead of
[a-zA-z0-9]*
it should be[a-zA-Z0-9]*
.You don't need
^
and$
around the regex.Matcher.matches()
always matches the complete string.this prints
"doesn't match."
正确答案是以上答案的组合。首先,我想象您想要的字符匹配是 [a-zA-Z0-9]。请注意,Az 并不像您想象的那么糟糕,它包含 A 和 z 之间 ASCII 范围内的所有字符,即字母加上一些额外的字符(特别是 [,\,],^,_,`)。
正如马丁提到的第二个潜在问题是,如果您希望字符串仅由字母和数字组成,则可能需要放入开始和结束限定符。
最后,您使用 * 运算符,这意味着 0 个或多个,因此您可以匹配 0 个字符,并且匹配将返回 true,因此您的模式实际上将匹配任何输入。您需要的是 + 量词。所以我将提交您最有可能寻找的模式是:
^[a-zA-Z0-9]+$
The correct answer is a combination of the above answers. First I imagine your intended character match is [a-zA-Z0-9]. Note that A-z isn't as bad as you might think it include all characters in the ASCII range between A and z, which is the letters plus a few extra (specifically [,\,],^,_,`).
A second potential problem as Martin mentioned is you may need to put in the start and end qualifiers, if you want the string to only consists of letters and numbers.
Finally you use the * operator which means 0 or more, therefore you can match 0 characters and matches will return true, so effectively your pattern will match any input. What you need is the + quantifier. So I will submit the pattern you are most likely looking for is:
^[a-zA-Z0-9]+$
您必须将正则表达式更改为
"^[a-zA-Z0-9]*$"
以确保匹配整个字符串You have to change the regexp to
"^[a-zA-Z0-9]*$"
to ensure that you are matching the entire string看起来应该是“a-zA-Z0-9”,而不是“a-zA-z0-9”,尝试更正...
Looks like it should be "a-zA-Z0-9", not "a-zA-z0-9", try correcting that...
有人考虑过在正则表达式中添加空格
[a-zA-Z0-9]*
。这应该与任何带有字符、数字和空格的普通文本匹配。如果您想要引号和其他特殊字符,请将它们也添加到正则表达式中。您可以在 http://www.regexplanet.com/simple/ 快速测试您的正则表达式
Did anyone consider adding space to the regex
[a-zA-Z0-9 ]*
. this should match any normal text with chars, number and spaces. If you want quotes and other special chars add them to the regex too.You can quickly test your regex at http://www.regexplanet.com/simple/
您可以检查输入值是否包含字符串和数字?通过使用正则表达式 ^[a-zA-Z0-9]*$
如果您的值仅包含 numberString 而不是其显示 match 即,riz99,riz99z
否则会显示不匹配,即,99z., riz99.z, riz99.9
示例代码:
在线工作示例
You can check input value is contained string and numbers? by using regex ^[a-zA-Z0-9]*$
if your value just contained numberString than its show match i.e, riz99, riz99z
else it will show not match i.e, 99z., riz99.z, riz99.9
Example code:
online working example