多线程应用程序中的套接字读取返回零字节或 EINTR (104)
我已经是 C 程序员一段时间了——既不是新手也不是专家。现在,我在 PPC Linux 上有一个用 C 编写的守护进程应用程序。我使用 PHP 的 socket_connect 作为客户端在本地连接到该服务。服务器使用 epoll 通过 Unix 套接字进行多路复用连接。使用 strstr() 解析用户提交的字符串以查找某些字符/单词,如果找到,则会同时生成 4 个可连接的线程到不同的网站。我使用套接字、连接、写入和读取,通过 TCP 在每个线程的端口 80 上与上述网络服务器进行交互。所有连接和写入似乎都成功。然而,对网络服务器套接字的读取失败,原因是 (A) 所有 3 个线程似乎都挂起,并且只有一个线程返回 -1 并且 errno 设置为 104。响应线程大约需要 10 分钟 - 永恒之久:-(。 *我在某处读到 104(是 EINTR?),在网络上下文中表明...“连接已被对等方重置”或(B)来自 3 个线程的 0 个字节,并且 4 个线程中只有 1 个实际返回一些数据不是套接字读/写线程安全的吗?我使用线程安全(和可重入)libc 函数,例如 strtok_r、gethostbyname_r 等。
*我怀疑上述 webhosts 实际上正在重置连接,因为当我运行单线程独立(其他条件相同)所有事情都工作得很好,但当然串联而不是并行
还有第二个问题(哎呀),我无法写回连接到我的 epoll-ed 的客户端。 Unix 套接字。我的守护程序应用程序将永远挂起并占用 CPU > 100%,但我确信客户端(一个非常典型的 PHP 套接字应用程序)没有在发生这种情况时关闭连接。也没有检测到错误。有什么想法吗?
即使使用 Valgrind、GDB 或大量日志记录,我也无法找出问题所在。请尽您所能提供帮助。
Am a c-coder for a while now - neither a newbie nor an expert. Now, I have a certain daemoned application in C on a PPC Linux. I use PHP's socket_connect as a client to connect to this service locally. The server uses epoll for multiplexing connections via a Unix socket. A user submitted string is parsed for certain characters/words using strstr() and if found, spawns 4 joinable threads to different websites simultaneously. I use socket, connect, write and read, to interact with the said webservers via TCP on their port 80 in each thread. All connections and writes seems successful. Reads to the webserver sockets fail however, with either (A) all 3 threads seem to hang, and only one thread returns -1 and errno is set to 104. The responding thread takes like 10 minutes - an eternity long:-(. *I read somewhere that the 104 (is EINTR?), which in the network context suggests that ...'the connection was reset by peer'; or (B) 0 bytes from 3 threads, and only 1 of the 4 threads actually returns some data. Isn't the socket read/write thread-safe? I use thread-safe (and reentrant) libc functions such as strtok_r, gethostbyname_r, etc.
*I doubt that the said webhosts are actually resetting the connection, because when I run a single-threaded standalone (everything else equal) all things works perfectly right, but of course in series not parallel.
There's a second problem too (oops), I can't write back to the client who connect to my epoll-ed Unix socket. My daemon application will hang and hog CPU > 100% for ever. Yet nothing is written to the clients end. Am sure the client (a very typical PHP socket application) hasn't closed the connection whenever this is happening - no error(s) detected either. Any ideas?
I cannot figure-out whatever is wrong even with Valgrind, GDB or much logging. Kindly help where you can.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
是的,读/写是线程安全的。但如果您使用 gethostbyname() 和 getservbyname(),请注意它们 - 它们返回指向静态数据的指针,并且可能不是线程安全的。
errno 104 是 ECONNREFUSED(不是 EINTR)。使用 strerror 或 perror 获取特定 errno 代码的文本错误消息(例如“对等方重置连接”)。
找出问题所在的最佳方法通常是进行非常详细的日志记录 - 记录每个操作的结果,以及连接的 IP 地址/端口、读/写的字节数、线程 ID 等详细信息。当然,请确保您的日志记录代码是线程安全的:-)
Yes, read/write are thread-safe. But beware of gethostbyname() and getservbyname() if you're using them - they return pointers to static data, and may not be thread-safe.
errno 104 is ECONNREFUSED (not EINTR). Use strerror or perror to get the textual error message (like 'Connection reset by peer') for a particular errno code.
The best way to figure out what's going wrong is often to do very detailed logging - log the results of every operation, plus details like the IP address/port connecting to, the number of bytes read/written, the thread id, and so forth. And, of course, make sure your logging code is thread-safe :-)
10 分钟后出现 ECONNRESET 听起来像是连接超时的结果。网络服务器未发送数据,或者您的应用程序未接收数据。
要测试前者,请将 Wireshark 之类的程序连接到本地环回设备,并查找进出您正在使用的端口的流量。
对于后者,请查看 epoll() 手册页。他们提到了一种情况,使用边缘触发事件可能会导致锁定,因为缓冲区中仍然有数据,但没有新数据进入,因此不会触发新事件。
Getting an ECONNRESET after 10 minutes sounds like the result of your connection timing out. Either the web server isn't sending the data or your app isn't receiving it.
To test the former, hookup a program like Wireshark to the local loopback device and look for traffic to and from the port you are using.
For the later, take a look at the epoll() man page. They mention a scenario where using edge triggered events could result in a lockup, because there is still data in the buffer, but no new data comes in so no new event is triggered.