如果考虑性能的话,Java 中字符串模式匹配的最佳方法

发布于 2024-11-10 12:51:57 字数 563 浏览 4 评论 0原文

您好,

假设您想测试一个字符串以查看它是否完全匹配,或者是否与 _ 以及 _ 后面附加的任意数量的字符匹配

。 有效匹配示例:

MyTestString
MyTestString_
MyTestString_1234

如果性能是一个巨大的问题,哪些方法会你调查吗?目前我正在执行以下操作:

if (String.equals(stringToMatch)) {
            // success
        } else {
            if (stringToMatch.contains(stringToMatch + "_")) {
                // success
            }
            // fail
        }

我尝试用 _* 上的 Java.util.regex.Pattern 匹配替换 String.contains _ 模式,但效果更差。我的解决方案是否理想,或者您能想到更有效的方法来进一步提高性能吗?

感谢您的任何想法

Greetings,

Let's say you wanted to test a string to see if it's an exact match, or, if it's a match with an _ and any number of characters appended following the _

Valid match examples:

MyTestString
MyTestString_
MyTestString_1234

If performance was a huge concern, which methods would you investigate? Currently I am doing the following:

if (String.equals(stringToMatch)) {
            // success
        } else {
            if (stringToMatch.contains(stringToMatch + "_")) {
                // success
            }
            // fail
        }

I tried replacing the pattern the String.contains _ with a Java.util.regex.Pattern match on _*, but that performed much worse. Is my solution here ideal or can you think of something more cleaver to improve performance a bit more?

Thanks for any thoughts

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

冷了相思 2024-11-17 12:51:57

你可以做类似的事情,

if(string.startsWith(testString)) {
    int len = testString.length();
    if(string.length() == len || string.charAt(len) == '_')
          // success
}

我假设你希望 testString 出现,即使你有一个“_”?


编辑:关于是否使用一个长条件或嵌套 if 语句,代码或性能没有差异。

public static void nestedIf(boolean a, boolean b) {
    if (a) {
        if (b) {
            System.out.println("a && b");
        }
    }
}

public static void logicalConditionIf(boolean a, boolean b) {
    if (a && b) {
        System.out.println("a && b");
    }
}

编译为相同的代码。如果你这样做 javap -c

public static void nestedIf(boolean, boolean);
  Code:
   0:   iload_0
   1:   ifeq    16
   4:   iload_1
   5:   ifeq    16
   8:   getstatic       #7; //Field java/lang/System.out:Ljava/io/PrintStream;
   11:  ldc     #8; //String a && b
   13:  invokevirtual   #9; //Method java/io/PrintStream.println:(Ljava/lang/String;)V
   16:  return

public static void logicalConditionIf(boolean, boolean);
  Code:
   0:   iload_0
   1:   ifeq    16
   4:   iload_1
   5:   ifeq    16
   8:   getstatic       #7; //Field java/lang/System.out:Ljava/io/PrintStream;
   11:  ldc     #8; //String a && b
   13:  invokevirtual   #9; //Method java/io/PrintStream.println:(Ljava/lang/String;)V
   16:  return

编译的代码是相同的。

You can do something like

if(string.startsWith(testString)) {
    int len = testString.length();
    if(string.length() == len || string.charAt(len) == '_')
          // success
}

I assume you want the testString to appear even if you have a "_"?


EDIT: On whether to use one long condition or nested if statements, there is no difference in code or performance.

public static void nestedIf(boolean a, boolean b) {
    if (a) {
        if (b) {
            System.out.println("a && b");
        }
    }
}

public static void logicalConditionIf(boolean a, boolean b) {
    if (a && b) {
        System.out.println("a && b");
    }
}

compiles to the same code. If you do javap -c

public static void nestedIf(boolean, boolean);
  Code:
   0:   iload_0
   1:   ifeq    16
   4:   iload_1
   5:   ifeq    16
   8:   getstatic       #7; //Field java/lang/System.out:Ljava/io/PrintStream;
   11:  ldc     #8; //String a && b
   13:  invokevirtual   #9; //Method java/io/PrintStream.println:(Ljava/lang/String;)V
   16:  return

public static void logicalConditionIf(boolean, boolean);
  Code:
   0:   iload_0
   1:   ifeq    16
   4:   iload_1
   5:   ifeq    16
   8:   getstatic       #7; //Field java/lang/System.out:Ljava/io/PrintStream;
   11:  ldc     #8; //String a && b
   13:  invokevirtual   #9; //Method java/io/PrintStream.println:(Ljava/lang/String;)V
   16:  return

The complied code is identical.

人间☆小暴躁 2024-11-17 12:51:57

您可以使用正则表达式来匹配模式。您可以使用stringToMatch.matches(".*?_.*?")。这将返回一个布尔值。

You could use regular expressions to match patterns. You can use stringToMatch.matches(".*?_.*?"). This returns a boolean.

征﹌骨岁月お 2024-11-17 12:51:57

我运行了一些基准测试。这是我能得到的最快的了。

    String a = "Test123";
    String b = "Test123_321tseT_Test_rest";
    int len1 = a.length();
    int len2 = b.length();
    if ((len1 == len2 || (len2 > len1 && (b.charAt(len1)) == '_'))
        && b.startsWith(a)) {
        System.out.println("success");
    } else {
        System.out.println("Fail");
    }

这至少可以在合理的性能下正常工作。

编辑:我切换了_检查和startsWith检查,因为startsWith会使_检查的性能更差。

Edit2:修复了 StringIndexOutOfBoundsException。

Edit3:Peter Lawrey 是正确的,仅对 a.length() 进行 1 次调用可以节省时间。以我为例,为 2.2%。
最新的基准测试显示我比 OP 快 88%,比 Peter Lawrey 的代码快 10%。

Edit4:我用本地变量替换所有 str.length() ,并运行了更多基准测试。现在基准测试的结果变得如此随机,不可能说哪种代码更快。我的最新版本似乎更胜一筹。

I ran some benchmarks. This is the quickest I can get.

    String a = "Test123";
    String b = "Test123_321tseT_Test_rest";
    int len1 = a.length();
    int len2 = b.length();
    if ((len1 == len2 || (len2 > len1 && (b.charAt(len1)) == '_'))
        && b.startsWith(a)) {
        System.out.println("success");
    } else {
        System.out.println("Fail");
    }

This will at least work correctly at reasonable performance.

Edit: I switched the _ check and the startsWith check, since startsWith will have worse perforam the _ check.

Edit2: Fixed StringIndexOutOfBoundsException.

Edit3: Peter Lawrey is correct that making only 1 call to a.length() spares time. 2.2% in my case.
Latest benchmark shows I'm 88% faster then OP and 10% faster then Peter Lawrey's code.

Edit4: I replace all str.length() with a local var, and ran dozen more benchmarks. Now the results of the benchmarks are getting so random it's impossible to say what code is faster. My latest version seems to win by a notch.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文