当前位置：文江博客话题详情

Buffer String C# search arrays

如何在 UTF-8 字节数组中找到字符串的起始索引？ (C#)

发布于 2024-09-28 19:10:37 字数 476 浏览 5 评论 0原文

我有一个 UTF-8 字节数据数组。我想在 C# 的字节数组中搜索特定字符串。

byte[] dataArray = (一些 UTF-8 字节数据数组);

string searchString = "Hello";

如何找到第一次出现的单词“Hello”在数组 dataArray 中并返回字符串开始的索引位置（“Hello”中的“H”将位于 dataArray 中）？

之前，我错误地使用了如下内容：

int helloIndex = Encoding.UTF8.GetString(dataArray).IndexOf("Hello");

显然，该代码不能保证正常工作，因为我返回的是String 的索引，而不是 UTF-8 字节数组的索引。是否有任何内置的 C# 方法或经过验证的、高效的代码可供我重用？

谢谢，

马特

收藏 0

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

评论（2）

执妄 2024-10-05 19:10:37

UTF-8 的一个很好的功能是，如果一个字节序列代表一个字符，并且该字节序列出现在有效 UTF-8 编码数据中的任何位置，那么它总是代表该字符。

知道了这一点，您可以将要搜索的字符串转换为字节数组，然后使用 Boyer-Moore 字符串搜索算法（或您喜欢的任何其他字符串搜索算法）稍微适应于字节数组而不是字符串。

这里有很多答案可以帮助您：

字节[]数组模式搜索

回复收藏 0 原文

趁年轻赶紧闹 2024-10-05 19:10:37

尝试以下代码片段：

// Setup our little test.

string sourceText = "ʤhello";

byte[] searchBytes = Encoding.UTF8.GetBytes(sourceText);

// Convert the bytes into a string we can search in.

string searchText = Encoding.UTF8.GetString(searchBytes);

int position = searchText.IndexOf("hello");

// Get all text that is before the position we found.

string before = searchText.Substring(0, position);

// The length of the encoded bytes is the actual number of UTF8 bytes
// instead of the position.

int bytesBefore = Encoding.UTF8.GetBytes(before).Length;

// This outputs Position is 1 and before is 2.

Console.WriteLine("Position is {0} and before is {1}", position, bytesBefore);

Try the following snippet:

// Setup our little test.

string sourceText = "ʤhello";

byte[] searchBytes = Encoding.UTF8.GetBytes(sourceText);

// Convert the bytes into a string we can search in.

string searchText = Encoding.UTF8.GetString(searchBytes);

int position = searchText.IndexOf("hello");

// Get all text that is before the position we found.

string before = searchText.Substring(0, position);

// The length of the encoded bytes is the actual number of UTF8 bytes
// instead of the position.

int bytesBefore = Encoding.UTF8.GetBytes(before).Length;

// This outputs Position is 1 and before is 2.

Console.WriteLine("Position is {0} and before is {1}", position, bytesBefore);

回复收藏 0 原文

~没有更多了~

关于作者

暂无简介

0 文章

0 评论

22 人气

关注发私信

相关话题

热门标签

操作系统程序设计 IT运维 Linux系统管理 JavaScript 服务器应用 solaris C/C++ PHP Shell BSD Vue.js aix Oracle Python HTML 系统管理 HTML5 CSS 前端

推荐作者

1CH1MKgiKxn9p

文章 0 评论 0

ゞ记忆︶ㄣ

文章 0 评论 0

JackDx

文章 0 评论 0

信远

文章 0 评论 0

yaoduoduo1995

文章 0 评论 0

霞映澄塘

文章 0 评论 0

友情链接

我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的隐私政策了解更多相关信息。单击 接受 或继续使用网站，即表示您同意使用 Cookies 和您的相关数据。

原文