在字节缓冲区中查找字符串
我正在从 C 转向 Java。我想知道如何在字节缓冲区中查找字符串,java中有类似memchr的东西吗?字节缓冲区只是部分字符串,其余部分是原始字节,因此任何 java 方法都必须处理字节+字符。
我也在java中寻找类似strsep的东西来分割字符串。
I'm switching from C to Java. I'm wondering about how to find a string inside a bytebuffer, is there something like memchr in java? The bytebuffer is only partly a string, the rest is raw bytes so any java method has to work on bytes + chars.
I am also searching for something like strsep in java to split strings.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
您可以将 ByteBuffer 转换为 String 并使用可能有效的 indexOf 。
这会产生不小的开销,因为它创建了一个字符串。另一种方法是暴力字符串搜索,它会更快,但需要时间来编写。
You can convert the ByteBuffer into a String and use indexOf which likely to work.
This has a non-trivial overhead as it creates a String. The alternative is a brute force String search which will be faster but takes time to write.
您需要使用适合您的应用程序的正确字符编码将字符串编码为字节。然后使用 Rabin-Karp 或 Boyer-Moore 等字符串搜索算法在缓冲区中查找生成的字节序列。或者,如果您的缓冲区很小,您可以执行强力搜索。
我不知道这些搜索算法有任何开源实现,而且它们不是核心 Java 的一部分。
You would need to encode the character string into bytes using the correct character encoding for your application. Then use a string search algorithm like Rabin-Karp or Boyer-Moore to find the resulting byte sequence within the buffer. Or, if your buffers are small, you could just perform a brute force search.
I'm not aware of any open source implementations of these search algorithms, and they aren't part of core Java.
从 最快的方法使用 java 在文本文件中查找字符串:
我在 MIMEParser 中找到的最佳实现:https://github.com/samskivert/ikvm-openjdk/blob/master/build/linux-amd64/impsrc/com/sun/xml/internal/org/jvnet/mimepull/MIMEParser.java
还需要:
From Fastest way to find a string in a text file with java:
The best realization I've found in MIMEParser: https://github.com/samskivert/ikvm-openjdk/blob/master/build/linux-amd64/impsrc/com/sun/xml/internal/org/jvnet/mimepull/MIMEParser.java
Needed also:
String 类有一个很好的 split 方法 String.split
The String class has a nice split method String.split
一种选择是使用
StringTokenizer< /code>
,它可以根据给定的分隔符将字符串拆分为可迭代的标记集合。如果需要,标记集合可以包含分隔符。示例:
预期结果:
One option is to use a
StringTokenizer
, which can split the string into an iterable collection of tokens according to given delimiter(s). The tokens collection can contain the delimiter if needed. Example:Expected result: