在字节缓冲区中查找字符串

发布于 2024-12-23 01:30:30 字数 143 浏览 3 评论 0原文

我正在从 C 转向 Java。我想知道如何在字节缓冲区中查找字符串,java中有类似memchr的东西吗?字节缓冲区只是部分字符串,其余部分是原始字节,因此任何 java 方法都必须处理字节+字符。

我也在java中寻找类似strsep的东西来分割字符串。

I'm switching from C to Java. I'm wondering about how to find a string inside a bytebuffer, is there something like memchr in java? The bytebuffer is only partly a string, the rest is raw bytes so any java method has to work on bytes + chars.

I am also searching for something like strsep in java to split strings.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

儭儭莪哋寶赑 2024-12-30 01:30:30

您可以将 ByteBuffer 转换为 String 并使用可能有效的 indexOf 。

ByteBuffer bb = /* non-direct byte buffer */
String text = new String(bb.array(), 0, bb.position(), bb.remaing());
int index = text.indexOf(searchText);

这会产生不小的开销,因为它创建了一个字符串。另一种方法是暴力字符串搜索,它会更快,但需要时间来编写。

You can convert the ByteBuffer into a String and use indexOf which likely to work.

ByteBuffer bb = /* non-direct byte buffer */
String text = new String(bb.array(), 0, bb.position(), bb.remaing());
int index = text.indexOf(searchText);

This has a non-trivial overhead as it creates a String. The alternative is a brute force String search which will be faster but takes time to write.

顾铮苏瑾 2024-12-30 01:30:30

您需要使用适合您的应用程序的正确字符编码将字符串编码为字节。然后使用 Rabin-Karp 或 Boyer-Moore 等字符串搜索算法在缓冲区中查找生成的字节序列。或者,如果您的缓冲区很小,您可以执行强力搜索。

我不知道这些搜索算法有任何开源实现,而且它们不是核心 Java 的一部分。

You would need to encode the character string into bytes using the correct character encoding for your application. Then use a string search algorithm like Rabin-Karp or Boyer-Moore to find the resulting byte sequence within the buffer. Or, if your buffers are small, you could just perform a brute force search.

I'm not aware of any open source implementations of these search algorithms, and they aren't part of core Java.

妄断弥空 2024-12-30 01:30:30

最快的方法使用 java 在文本文件中查找字符串

我在 MIMEParser 中找到的最佳实现:https://github.com/samskivert/ikvm-openjdk/blob/master/build/linux-amd64/impsrc/com/sun/xml/internal/org/jvnet/mimepull/MIMEParser.java

/**
  * Finds the boundary in the given buffer using Boyer-Moore algo.
  * Copied from java.util.regex.Pattern.java
  *
  * @param mybuf boundary to be searched in this mybuf
  * @param off start index in mybuf
  * @param len number of bytes in mybuf
  *
  * @return -1 if there is no match or index where the match starts
  */

  private int match(byte[] mybuf, int off, int len) {

还需要:

  private void compileBoundaryPattern();

From Fastest way to find a string in a text file with java:

The best realization I've found in MIMEParser: https://github.com/samskivert/ikvm-openjdk/blob/master/build/linux-amd64/impsrc/com/sun/xml/internal/org/jvnet/mimepull/MIMEParser.java

/**
  * Finds the boundary in the given buffer using Boyer-Moore algo.
  * Copied from java.util.regex.Pattern.java
  *
  * @param mybuf boundary to be searched in this mybuf
  * @param off start index in mybuf
  * @param len number of bytes in mybuf
  *
  * @return -1 if there is no match or index where the match starts
  */

  private int match(byte[] mybuf, int off, int len) {

Needed also:

  private void compileBoundaryPattern();
孤千羽 2024-12-30 01:30:30

String 类有一个很好的 split 方法 String.split

The String class has a nice split method String.split

筱果果 2024-12-30 01:30:30

一种选择是使用 StringTokenizer< /code>,它可以根据给定的分隔符将字符串拆分为可迭代的标记集合。如果需要,标记集合可以包含分隔符。示例:

String s = "abc:def-ghi|jkl";
StringTokenizer tokenizer = new StringTokenizer(s, ":-|");
while (tokenizer.hasMoreTokens()) {
  System.out.print(tokenizer.nextToken());
}

预期结果:

abcdefghijkl

One option is to use a StringTokenizer, which can split the string into an iterable collection of tokens according to given delimiter(s). The tokens collection can contain the delimiter if needed. Example:

String s = "abc:def-ghi|jkl";
StringTokenizer tokenizer = new StringTokenizer(s, ":-|");
while (tokenizer.hasMoreTokens()) {
  System.out.print(tokenizer.nextToken());
}

Expected result:

abcdefghijkl

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文