异步套接字服务器如何工作?

发布于 2024-10-21 08:12:12 字数 602 浏览 2 评论 0原文

我应该声明,我并没有询问具体的实现细节(还),而只是询问正在发生的事情的一般概述。我了解套接字背后的基本概念,并且需要澄清整个过程。我目前的理解(可能非常错误)是这样的:

套接字不断监听想要连接的客户端(在它自己的线程中)。当连接发生时,会引发一个事件,该事件会生成另一个线程来执行连接过程。在连接过程中,客户端被分配有自己的套接字,用于与服务器进行通信。然后,服务器等待来自客户端的数据,当数据到达时,会引发一个事件,该事件会生成一个线程,将数据从流读取到缓冲区中。

我的问题是:

我的理解程度如何?

每个客户端套接字是否都需要它自己的线程来侦听数据?

数据如何路由到正确的客户端套接字?这是 TCP/UDP/内核内部处理的事情吗?

在这个线程环境中,通常共享什么类型的数据,争论的焦点是什么?

任何澄清和补充解释将不胜感激。

编辑:

关于通常共享哪些数据和争论点的问题,我意识到这更多的是一个实现细节,而不是关于接受连接和发送/接收数据的一般过程的问题。我研究了几个实现(SuperSocket 和 Kayak),并注意到会话缓存和可重用缓冲池等方面的一些同步。请随意忽略这个问题。我非常感谢您的所有反馈。

I should state that I'm not asking about specific implementation details (yet), but just a general overview of what's going on. I understand the basic concept behind a socket, and need clarification on the process as a whole. My (probably very wrong) understanding is currently this:

A socket is constantly listening for clients that want to connect (in its own thread). When a connection occurs, an event is raised that spawns another thread to perform the connection process. During the connection process the client is assigned it's own socket in which to communicate with the server. The server then waits for data from the client and when data arrives an event is raised which spawns a thread to read the data from a stream into a buffer.

My questions are:

How off is my understanding?

Does each client socket require it's own thread to listen for data on?

How is data routed to the correct client socket? Is this something taken care of by the guts of TCP/UDP/kernel?

In this threaded environment, what kind of data is typically being shared, and what are the points of contention?

Any clarifications and additional explanation would be greatly appreciated.

EDIT:

Regarding the question about what data is typically shared and points of contention, I realize this is more of an implementation detail than it is a question regarding general process of accepting connections and sending/receiving data. I had looked at a couple implementations (SuperSocket and Kayak) and noticed some synchronization for things like session cache and reusable buffer pools. Feel free to ignore this question. I've appreciated all your feedback.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

迷爱 2024-10-28 08:12:12

每个连接一个线程是糟糕的设计(不可扩展,过于复杂),但不幸的是太常见了。

套接字服务器的工作方式大致如下:

  • 设置侦听套接字来接受连接,并将其添加到套接字集中
  • 检查套接字集是否有事件
  • 如果侦听套接字有挂起的连接,则通过接受连接来创建新套接字,然后添加到套接字集
  • 如果已连接的套接字有事件,则调用相关的 IO 函数
  • 再次检查套接字集是否有事件

这发生在一个线程中,您可以在单个线程中轻松处理数千个已连接的套接字,并且没有什么正当理由通过引入线程使事情变得更加复杂。

while running
    select on socketset
    for each socket with events
        if socket is listener
            accept new connected socket
            add new socket to socketset
        else if socket is connection
            if event is readable
                read data
                process data
            else if event is writable
                write queued data
            else if event is closed connection
                remove socket from socketset
            end
        end
    done
done

IP 堆栈负责处理哪些数据包按哪个顺序发送到哪个“套接字”的所有详细信息。从应用程序的角度来看,套接字代表可靠的有序字节流(TCP)或不可靠的无序数据包序列(UDP)

编辑:响应更新的问题。

我不知道您提到的任何一个库,但根据您提到的概念:

  • 会话缓存通常保存与客户端关联的数据,并且可以为多个连接重用此数据。当您的应用程序逻辑需要状态信息时,这是有意义的,但它比实际网络端更高一层。在上面的示例中,会话缓存将由“流程数据”部分使用。
  • 缓冲池也是高流量服务器的一种简单且通常有效的优化。这个概念非常容易实现,您无需分配/取消分配空间来存储读/写的数据,而是从池中获取预分配的缓冲区,使用它,然后将其返回到池中。这避免了(有时相对昂贵的)后端分配/释放机制。这与网络没有直接关系,您也可以使用缓冲池来读取文件块并处理它们。

One thread per connection is bad design (not scalable, overly complex) but unfortunately way too common.

A socket server works more or less like this:

  • A listening socket is setup to accept connections, and added to a socketset
  • The socket set is checked for events
  • If the listening socket has pending connections, new sockets are created by accepting the connections, and then added to the socket set
  • If a connected socket has events, the relevant IO functions are called
  • The socket set is checked for events again

This happens in one thread, you can easily handle thousands of connected sockets in a single thread, and there's few valid reasons for making this more complex by introducing threads.

while running
    select on socketset
    for each socket with events
        if socket is listener
            accept new connected socket
            add new socket to socketset
        else if socket is connection
            if event is readable
                read data
                process data
            else if event is writable
                write queued data
            else if event is closed connection
                remove socket from socketset
            end
        end
    done
done

The IP stack takes care of all the details of which packets go to what "socket" in which order. Seen from the applications point of view, a socket represents a reliable ordered byte stream (TCP) or an unreliable unordered sequence of packets(UDP)

EDIT: In response to updated question.

I don't know either of the libraries you mention, but on the concepts you mention:

  • A session cache typically keeps data associated with a client, and can reuse this data for multiple connections. This makes sense when your application logic requires state information, but it's a layer higher than the actual networking end. In the above sample, the session cache would be used by the "process data" part.
  • Buffer pools are also an easy and often effective optimization of a high-traffic server. The concept is very easy to implement, instead of allocating/deallocating space for storing data you read/write, you fetch a preallocated buffer from a pool, use it, then return it to a pool. This avoids the (sometimes relatively expensive) backend allocation/deallocation mechanisms. This is not directly related to networking, you can just as well use buffer pools for e.g. something that reads chunks of files and process them.
夜灵血窟げ 2024-10-28 08:12:12

我的理解程度如何?

相当远。

每个客户端套接字是否都需要它自己的线程来侦听数据?

不。

数据如何路由到正确的客户端套接字?这是 TCP/UDP/内核内部处理的事情吗?

TCP/IP 是多层协议。它没有“内核”。它由多个部分组成,每个部分都具有与其他部分相连的单独的 API。

IP 地址就地处理。

端口号在另一个地方处理。

IP 地址与 MAC 地址相匹配来识别特定主机。端口号将 TCP(或 UDP)套接字与特定的应用程序软件联系起来。

在这个线程环境中,通常共享什么类型的数据,争论点是什么?

什么线程环境?

数据共享?什么?

争论?物理信道是第一大争论点。 (例如,以太网依赖于冲突检测。)在那之后,计算机系统的每个部分都是多个应用程序共享的稀缺资源,并且是一个争论点。

How off is my understanding?

Pretty far.

Does each client socket require it's own thread to listen for data on?

No.

How is data routed to the correct client socket? Is this something taken care of by the guts of TCP/UDP/kernel?

TCP/IP is a number of layers of protocol. There's no "kernel" to it. It's pieces, each with a separate API to the other pieces.

The IP Address is handled in on place.

The port # is handled in another place.

The IP addresses are matched up with MAC addresses to identify a particular host. The port # is what ties a TCP (or UDP) socket to a particular piece of application software.

In this threaded environment, what kind of data is typically being shared, and what are the points of contention?

What threaded environment?

Data sharing? What?

Contention? The physical channel is the number one point of contention. (Ethernet, for example depends on collision-detection.) After that, well, every part of the computer system is a scarce resource shared by multiple applications and is a point of contention.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文