当前位置：文江博客话题详情

在字节缓冲区中查找字符串

发布于 2024-12-23 01:30:30 字数 143 浏览 3 评论 0原文

我正在从 C 转向 Java。我想知道如何在字节缓冲区中查找字符串，java中有类似memchr的东西吗？字节缓冲区只是部分字符串，其余部分是原始字节，因此任何 java 方法都必须处理字节+字符。

我也在java中寻找类似strsep的东西来分割字符串。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

儭儭莪哋寶赑 2024-12-30 01:30:30

您可以将 ByteBuffer 转换为 String 并使用可能有效的 indexOf 。

ByteBuffer bb = /* non-direct byte buffer */
String text = new String(bb.array(), 0, bb.position(), bb.remaing());
int index = text.indexOf(searchText);

这会产生不小的开销，因为它创建了一个字符串。另一种方法是暴力字符串搜索，它会更快，但需要时间来编写。

You can convert the ByteBuffer into a String and use indexOf which likely to work.

ByteBuffer bb = /* non-direct byte buffer */
String text = new String(bb.array(), 0, bb.position(), bb.remaing());
int index = text.indexOf(searchText);

This has a non-trivial overhead as it creates a String. The alternative is a brute force String search which will be faster but takes time to write.

回复收藏 0 原文

顾铮苏瑾 2024-12-30 01:30:30

您需要使用适合您的应用程序的正确字符编码将字符串编码为字节。然后使用 Rabin-Karp 或 Boyer-Moore 等字符串搜索算法在缓冲区中查找生成的字节序列。或者，如果您的缓冲区很小，您可以执行强力搜索。

我不知道这些搜索算法有任何开源实现，而且它们不是核心 Java 的一部分。

回复收藏 0 原文

妄断弥空 2024-12-30 01:30:30

从最快的方法使用 java 在文本文件中查找字符串：

我在 MIMEParser 中找到的最佳实现：https://github.com/samskivert/ikvm-openjdk/blob/master/build/linux-amd64/impsrc/com/sun/xml/internal/org/jvnet/mimepull/MIMEParser.java

/**
  * Finds the boundary in the given buffer using Boyer-Moore algo.
  * Copied from java.util.regex.Pattern.java
  *
  * @param mybuf boundary to be searched in this mybuf
  * @param off start index in mybuf
  * @param len number of bytes in mybuf
  *
  * @return -1 if there is no match or index where the match starts
  */

  private int match(byte[] mybuf, int off, int len) {

还需要：

  private void compileBoundaryPattern();

From Fastest way to find a string in a text file with java:

The best realization I've found in MIMEParser: https://github.com/samskivert/ikvm-openjdk/blob/master/build/linux-amd64/impsrc/com/sun/xml/internal/org/jvnet/mimepull/MIMEParser.java

/**
  * Finds the boundary in the given buffer using Boyer-Moore algo.
  * Copied from java.util.regex.Pattern.java
  *
  * @param mybuf boundary to be searched in this mybuf
  * @param off start index in mybuf
  * @param len number of bytes in mybuf
  *
  * @return -1 if there is no match or index where the match starts
  */

  private int match(byte[] mybuf, int off, int len) {

Needed also:

  private void compileBoundaryPattern();

回复收藏 0 原文

孤千羽 2024-12-30 01:30:30

String 类有一个很好的 split 方法 String.split

回复收藏 0 原文

筱果果 2024-12-30 01:30:30

一种选择是使用 StringTokenizer< /code>，它可以根据给定的分隔符将字符串拆分为可迭代的标记集合。如果需要，标记集合可以包含分隔符。示例：

String s = "abc:def-ghi|jkl";
StringTokenizer tokenizer = new StringTokenizer(s, ":-|");
while (tokenizer.hasMoreTokens()) {
  System.out.print(tokenizer.nextToken());
}

预期结果：

abcdefghijkl

One option is to use a StringTokenizer, which can split the string into an iterable collection of tokens according to given delimiter(s). The tokens collection can contain the delimiter if needed. Example:

String s = "abc:def-ghi|jkl";
StringTokenizer tokenizer = new StringTokenizer(s, ":-|");
while (tokenizer.hasMoreTokens()) {
  System.out.print(tokenizer.nextToken());
}

Expected result: