检查子字符串的快速方法

发布于 2024-12-12 02:34:02 字数 765 浏览 0 评论 0原文

我目前正在编写一个基于服务器-客户端模型并使用 TCP 作为通信协议的聊天系统。虽然它按预期工作,但我想进一步优化服务器端的重要部分。

服务器使用四个额外的线程来处理新连接、控制台输入等,而不会阻塞正常的聊天对话。好吧,从客户端发送到客户端的所有消息只有一个线程,所以我认为优化那里的代码会很好,因为它将是最明显的瓶颈。读取每个客户端套接字上的数据后,必须使用不同的步骤处理数据。这些步骤之一是检查被阻止的单词。这就是我最初的问题开始的地方。


我使用了 std::string::find() 和 strstr() 函数。根据我的测试,std::string::find() 明显比旧的 C 风格 strstr() 函数更快。

我知道 std::string 优化得很好,但是 C 风格的 char 数组及其自己的函数似乎总是更快一些,特别是如果字符串必须被一遍又一遍地建造。

那么,有没有比 std::string::find() 更快地扫描一系列字符以查找被阻止的单词的方法呢? std::string::find() 是否比 strstr() 更快,或者我的基准测试很糟糕?我知道与保持 C 风格 char 数组及其函数干净所需的努力相比,这种收获可能微不足道,但我想尽可能快地保持它,即使它只是为了测试目的。


编辑:抱歉,忘了提及我正在使用 MSVC++2010 Express。我只针对 Windows 机器。

I'm currently programming a chat system based on a server - client model and using TCP as the communication protocol. Although it's working as expected, I'd like to further optimize important parts on the server side.

The server uses four extra threads to handle new connections, console input, etc, without blocking normal chat conversations. Well, there is only one thread for all messages that are being sent from client to client, so I assume it would be good to optimize the code there, as it would be the most obvious bottleneck. After reading the data on each client's socket, the data has to be processed using different steps. One of those steps would be to check for blocked words. And that's where my original question starts.


I played with std::string::find() and the strstr() function. According to my tests, std::string::find() was clearly faster than the old C-style strstr() function.

I know that the std::string is very well optimized, but C-style char arrays and their own functions always seemed to be somewhat faster, especially if the string has to be constructed over and over again.

So, is there anything faster than std::string::find() to scan a series of characters for blocked words? Is std::string::find() faster than strstr(), or are my benchmarks lousy? I know that the gain may be negligigle compared to effort needed to keep C-style char arrays and their functions clean, but I'd like to keep it as fast as possible, even if it is just for testing purposes.


EDIT: Sorry, forgot to mention that I am using MSVC++2010 Express. I am only targeting Windows machines.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

不及他 2024-12-19 02:34:02

您是否进行了基准测试来验证检查阻止的单词实际上花费了很多时间?我完全天真的猜测是,与任何本地处理相比,您等待 RPC 的时间会多得多......

Have you benchmarked to verify that lots of time is in fact being taken in the check for blocked words? My completely naive guess is you're gonna be spending lots more time waiting for RPCs than any local processing...

过期以后 2024-12-19 02:34:02

您是否尝试过 C++11 中的正则表达式库(如果您使用它)或 Boost(如果您不使用)?我不确定速度,但我相信它们表现得很好。此外,如果您将其用作亵渎过滤器的一种形式,则无论如何您都需要正则表达式来防止微不足道的规避。

Have you tried the regular expressions library in either C++11 if you use that, or Boost if you don't? I'm not sure about the speed, but I believe they perform quite well. Additionally, if you are using this as a form of profanity filter, you'd want regular expressions anyway to prevent trivial circumvention.

吻泪 2024-12-19 02:34:02

存在比 STL 或 strstr 中通常使用的线性搜索更快的搜索算法。

Boyer-Moore 非常受欢迎。它需要对目标字符串进行预处理,这对于您的用例来说应该是可行的。

精确字符串匹配算法是一本免费电子书,深入介绍了不同搜索算法及其权衡的描述。

实施更先进的算法可能需要付出相当大的努力。
正如其他答案中所述,字符串搜索是否是聊天服务器中的瓶颈是值得怀疑的。

There exist faster searching-algorithms than the linear search typically used in STL, or strstr.

Boyer-Moore is quite popular. It requires preprocessing of the target-string, which should be feasible for your usecase.

Exact string matching algorithms is a free e-book with an in-depth description of different search-algorithms and their tradeofs.

Implementing more advanced algorithms could take considerable effort.
As said in the other answers, It is doubtful that string-searching is a bottle-neck in your chat-server.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文