哪个更快:多个“发送”或使用缓冲?

发布于 2024-08-29 01:28:38 字数 449 浏览 4 评论 0原文

我正在使用 C/Python 中的套接字,我想知道将标头从 Python 字典发送到客户端套接字的最有效方法是什么。

我的想法:

  1. 对每个标头使用 send 调用。 优点:无需分配内存。 缺点:许多send调用——可能容易出错; 错误管理应该相当复杂
  2. 使用缓冲区的 。 优点一次 send 调用,错误检查更容易。 缺点:需要一个缓冲区:-) malloc/realloc 应该相当慢,并且使用(太大)的缓冲区来避免 realloc 调用会浪费内存。

有什么建议给我吗?谢谢 :-)

I'm playing around with sockets in C/Python and I wonder what is the most efficient way to send headers from a Python dictionary to the client socket.

My ideas:

  1. use a send call for every header. Pros: No memory allocation needed. Cons: many send calls -- probably error prone; error management should be rather complicated
  2. use a buffer. Pros: one send call, error checking a lot easier. Cons: Need a buffer :-) malloc/realloc should be rather slow and using a (too) big buffer to avoid realloc calls wastes memory.

Any tips for me? Thanks :-)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

碍人泪离人颜 2024-09-05 01:28:38

由于 TCP 拥塞控制的工作方式,一次性发送所有数据会更加高效。 TCP 维护一个允许“空中”传输(已发送但尚未确认)数据量的窗口。 TCP 测量返回的确认,以计算出在不造成拥塞(即数据包丢失)的情况下“空中”可以拥有多少数据。如果应用程序没有足够的数据来填充窗口,TCP 就无法进行准确的测量,因此它将保守地缩小窗口。

如果您只有几个小标头,并且您对 send 的调用快速连续,则操作系统通常会为您缓冲数据并将其全部发送到一个数据包中。在这种情况下,TCP 拥塞控制并不是真正的问题。但是,每次调用 send 都涉及从用户模式到内核模式的上下文切换,这会产生 CPU 开销。换句话说,您最好在应用程序中进行缓冲。

(至少)在一种情况下,不使用缓冲会更好:当缓冲速度慢于上下文切换开销时。如果您用 Python 编写一个复杂的缓冲区,情况很可能就是这样。用 CPython 编写的缓冲区将比内核中精细优化的缓冲区慢很多。缓冲所付出的代价很可能超过它所带来的好处。

如有疑问,请进行测量。

但需要注意的是:过早的优化是万恶之源。这里的效率差异非常小。如果您尚未确定这是您的应用程序的瓶颈,请选择让您的生活更轻松的方式。您以后可以随时更改它。

Because of the way TCP congestion control works, it's more efficient to send data all at once. TCP maintains a window of how much data it will allow to be "in the air" (sent but not yet acknowledged). TCP measures the acknowledgments coming back to figure out how much data it can have "in the air" without causing congestion (i.e., packet loss). If there isn't enough data coming from the application to fill the window, TCP can't make accurate measurements so it will conservatively shrink the window.

If you only have a few, small headers and your calls to send are in rapid succession, the operating system will typically buffer the data for you and send it all in one packet. In that case, TCP congestion control isn't really an issue. However, each call to send involves a context switch from user mode to kernel mode, which incurs CPU overhead. In other words, you're still better off buffering in your application.

There is (at least) one case where you're better off without buffering: when your buffer is slower than the context switching overhead. If you write a complicated buffer in Python, that might very well be the case. A buffer written in CPython is going to be quite a bit slower than the finely optimized buffer in the kernel. It's quite possible that buffering would cost you more than it buys you.

When in doubt, measure.

One word of caution though: premature optimization is the root of all evil. The difference in efficiency here is pretty small. If you haven't already established that this is a bottleneck for your application, go with whatever makes your life easier. You can always change it later.

一个人的夜不怕黑 2024-09-05 01:28:38

除非您发送的数据量确实巨大,否则最好使用一个缓冲区。如果您使用几何级数来增加缓冲区大小,则分配数量将成为摊销常数,并且分配缓冲区的时间通常会随之增加。

Unless you're sending a truly huge amount of data, you're probably better off using one buffer. If you use a geometric progression for growing your buffer size, the number of allocations becomes an amortized constant, and the time to allocate the buffer will generally follow.

影子的影子 2024-09-05 01:28:38

send() 调用意味着到内核(操作系统中直接处理硬件的部分)的往返。它的单位成本约为几百个时钟周期。除非您尝试调用 send() 数百万次,否则这是无害的。

通常,缓冲只是在收集到“足够的数据”时偶尔调用 send() 一次。 “足够”并不意味着“整个消息”,而是指“足够的字节,使得内核往返的单位成本相形见绌”。根据经验,传统上认为 8 kB 缓冲区(8192 字节)就足够了。

无论如何,对于所有与性能相关的问题,没有什么比实际测量更好的了。尝试一下。大多数时候,没有任何值得担心的实际性能问题。

A send() call implies a round-trip to the kernel (the part of the OS which deals with the hardware directly). It has a unit cost of about a few hundred clock cycles. This is harmless unless you are trying to call send() millions of times.

Usually, buffering is about calling send() only once in a while, when "enough data" has been gathered. "Enough" does not mean "the whole message" but something like "enough bytes so that the unit cost of the kernel round-trip is dwarfed". As a rule of thumb, an 8-kB buffer (8192 bytes) is traditionally considered as good.

Anyway, for all performance-related questions, nothing beats an actual measure. Try it. Most of the time, there not any actual performance problem worth worrying about.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文