如何在 UTF-8 字节数组中找到字符串的起始索引? (C#)
我有一个 UTF-8 字节数据数组。我想在 C# 的字节数组中搜索特定字符串。
byte[] dataArray = (一些 UTF-8 字节数据数组);
string searchString = "Hello";
如何找到第一次出现的单词“Hello”在数组 dataArray 中并返回字符串开始的索引位置(“Hello”中的“H”将位于 dataArray 中)?
之前,我错误地使用了如下内容:
int helloIndex = Encoding.UTF8.GetString(dataArray).IndexOf("Hello");
显然,该代码不能保证正常工作,因为我返回的是String 的索引,而不是 UTF-8 字节数组的索引。是否有任何内置的 C# 方法或经过验证的、高效的代码可供我重用?
谢谢,
马特
I have a UTF-8 byte array of data. I would like to search for a specific string in the array of bytes in C#.
byte[] dataArray = (some UTF-8 byte array of data);
string searchString = "Hello";
How do I find the first occurrence of the word "Hello" in the array dataArray and return an index location where the string begins (where the 'H' from 'Hello' would be located in dataArray)?
Before, I was erroneously using something like:
int helloIndex = Encoding.UTF8.GetString(dataArray).IndexOf("Hello");
Obviously, that code would not be guaranteed to work since I am returning the index of a String, not the index of the UTF-8 byte array. Are there any built-in C# methods or proven, efficient code I can reuse?
Thanks,
Matt
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
UTF-8 的一个很好的功能是,如果一个字节序列代表一个字符,并且该字节序列出现在有效 UTF-8 编码数据中的任何位置,那么它总是代表该字符。
知道了这一点,您可以将要搜索的字符串转换为字节数组,然后使用 Boyer-Moore 字符串搜索算法(或您喜欢的任何其他字符串搜索算法)稍微适应于字节数组而不是字符串。
这里有很多答案可以帮助您:
One of the nice features about UTF-8 is that if a sequence of bytes represents a character and that sequence of bytes appears anywhere in valid UTF-8 encoded data then it always represents that character.
Knowing this, you can convert the string you are searching for to a byte array and then use the Boyer-Moore string searching algorithm (or any other string searching algorithm you like) adapted slightly to work on byte arrays instead of strings.
There are a number of answers here that can help you:
尝试以下代码片段:
Try the following snippet: