如果考虑性能的话,Java 中字符串模式匹配的最佳方法
您好,
假设您想测试一个字符串以查看它是否完全匹配,或者是否与 _ 以及 _ 后面附加的任意数量的字符匹配
。 有效匹配示例:
MyTestString
MyTestString_
MyTestString_1234
如果性能是一个巨大的问题,哪些方法会你调查吗?目前我正在执行以下操作:
if (String.equals(stringToMatch)) {
// success
} else {
if (stringToMatch.contains(stringToMatch + "_")) {
// success
}
// fail
}
我尝试用 _* 上的 Java.util.regex.Pattern 匹配替换 String.contains _ 模式,但效果更差。我的解决方案是否理想,或者您能想到更有效的方法来进一步提高性能吗?
感谢您的任何想法
Greetings,
Let's say you wanted to test a string to see if it's an exact match, or, if it's a match with an _ and any number of characters appended following the _
Valid match examples:
MyTestString
MyTestString_
MyTestString_1234
If performance was a huge concern, which methods would you investigate? Currently I am doing the following:
if (String.equals(stringToMatch)) {
// success
} else {
if (stringToMatch.contains(stringToMatch + "_")) {
// success
}
// fail
}
I tried replacing the pattern the String.contains _ with a Java.util.regex.Pattern match on _*, but that performed much worse. Is my solution here ideal or can you think of something more cleaver to improve performance a bit more?
Thanks for any thoughts
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
你可以做类似的事情,
我假设你希望 testString 出现,即使你有一个“_”?
编辑:关于是否使用一个长条件或嵌套 if 语句,代码或性能没有差异。
编译为相同的代码。如果你这样做
javap -c
编译的代码是相同的。
You can do something like
I assume you want the testString to appear even if you have a "_"?
EDIT: On whether to use one long condition or nested if statements, there is no difference in code or performance.
compiles to the same code. If you do
javap -c
The complied code is identical.
您可以使用正则表达式来匹配模式。您可以使用
stringToMatch.matches(".*?_.*?")
。这将返回一个布尔值。You could use regular expressions to match patterns. You can use
stringToMatch.matches(".*?_.*?")
. This returns a boolean.我运行了一些基准测试。这是我能得到的最快的了。
这至少可以在合理的性能下正常工作。
编辑:我切换了_检查和startsWith检查,因为startsWith会使_检查的性能更差。
Edit2:修复了 StringIndexOutOfBoundsException。
Edit3:Peter Lawrey 是正确的,仅对 a.length() 进行 1 次调用可以节省时间。以我为例,为 2.2%。
最新的基准测试显示我比 OP 快 88%,比 Peter Lawrey 的代码快 10%。
Edit4:我用本地变量替换所有 str.length() ,并运行了更多基准测试。现在基准测试的结果变得如此随机,不可能说哪种代码更快。我的最新版本似乎更胜一筹。
I ran some benchmarks. This is the quickest I can get.
This will at least work correctly at reasonable performance.
Edit: I switched the _ check and the startsWith check, since startsWith will have worse perforam the _ check.
Edit2: Fixed StringIndexOutOfBoundsException.
Edit3: Peter Lawrey is correct that making only 1 call to a.length() spares time. 2.2% in my case.
Latest benchmark shows I'm 88% faster then OP and 10% faster then Peter Lawrey's code.
Edit4: I replace all str.length() with a local var, and ran dozen more benchmarks. Now the results of the benchmarks are getting so random it's impossible to say what code is faster. My latest version seems to win by a notch.