iSeries 计算机上的 PASE 出现网络错误
我正在运行一个用 C 语言编写的服务器程序,在 iSeries 机器上的 PASE 上运行。 PASE(便携式 AIX 解决方案环境)是 IBM iSeries 机器上 AIX 的模拟。
服务器程序是一个面向连接的迭代tcp服务器。
服务器逻辑包含对accept() 的调用,该调用返回套接字描述符。 接下来是调用 ioctl() 以使用 F_IONBIO 将套接字设置为非阻塞。
对 ioctl 的调用间歇性失败,返回 -1 且 errno = 9(EBADF:错误文件描述符), 大约占调用次数的 0.8%。一旦特定套接字描述符失败, 下一个失败总是针对相同的套接字描述符和相同的 errno。
发生这种情况时,客户端会失败并显示 errno = 73,即连接被对等方重置。
服务器是一个守护进程;所以 stdin 在初始化时关闭,并且在accept() 上可用。 最初我观察到 ioctl() 对于套接字描述符 0 失败,但并非总是如此。 因此,我尝试通过将 stdin 设置为“/dev/null”来防止重复使用套接字描述符 0,以防出现问题。 但我不确定这是否是主要问题。更改后尚未得到测试结果。
仅在某些机器上观察到问题,并且通常是在机器加载时。所以这似乎是某种竞争条件。 服务器逻辑经过充分测试,看起来很稳定。
在 PASE 或 AIX 平台上是否观察到任何与套接字相关的问题?这可能与操作系统有关吗?
任何有关此问题的帮助/指示将不胜感激。
预先感谢,
平均
I am running a server program, written in C running on PASE on an iSeries machine.
PASE (Portable AIX Solutions Environment) is a simulation of AIX on IBM iSeries machines.
Server program is a connection oriented iterative tcp server.
Server logic contains call to accept() which returns a socket descriptor.
This is followed by call to ioctl() to set the socket non blocking using F_IONBIO.
This call to ioctl fails intermittently, returns -1 with errno = 9 (EBADF : bad file descriptor) ,
for approximately 0.8% percent of the times it is called. Once it fails for a particular socket descriptor,
the next failures are always for the same socket descriptor and with same errno.
When this happens, client side fails with errno = 73, i.e. connection reset by peer.
The server is a daemon process; so stdin is closed on initialization, and is available on accept().
Initially I observed that ioctl() failed for socket descriptor 0, but not always.
Hence, I tried to prevent reuse of socket descriptor 0 by setting stdin to '/dev/null', in case that was the issue.
But I am not sure if this was the main issue. Yet to get the test results after this change.
Issue has been observed only on some machines, and usually when machine is loaded. So this seems to be some sort of a race condition.
Server logic is well tested and seems to be stable.
Have any socket related issues been observed on PASE or AIX platform? Could this be OS related?
Any help/pointers with this issue would be appreciated.
thanks in advance,
avg
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您是否有可能遇到每个作业默认最多 200 个文件描述符的情况?
如果是这样,您可以使用 DosSetRelMaxFH( )--更改最大文件描述符数量 API以增加限制。
如果这不是问题,我建议收集并检查错误的 SST 通信跟踪。有关更多信息,请参阅 TCP/IP 通信跟踪说明。
接下来,我将检查组 PTF 级别,尤其是 SF99315 TCP/IP 组 PTF。
IBM 支持对于追踪此类问题确实很有帮助。
Is there any chance you are running up against the default maximum of 200 file descriptors per job?
If so you can use the DosSetRelMaxFH()--Change Maximum Number of File Descriptors API to increase the limit.
If that's not the issue I suggest collecting and examining an SST communications trace of the error. See the TCP/IP Communications Trace Instructions for more information.
Next I would check the group PTF levels especially SF99315 TCP/IP Group PTF.
IBM support is really helpful tracking down issues like these.