Java Solaris NIO OP_CONNECT 问题
我有一个 Java 客户端,它使用 Java NIO 使用 TCP 套接字连接到 C++ 服务器。这在 Linux、AIX 和 HP/UX 下有效,但在 Solaris 下,OP_CONNECT
事件永远不会触发。
更多详细信息:
Selector.select()
返回 0,并且“选定的键集”为空。- 该问题仅在连接到本地计算机(通过环回或以太网接口)时出现,但在连接到远程计算机时会出现。
- 我已经在两台不同的 Solaris 10 机器上确认了该问题;使用 JDK 版本 1.6.0_21 和 _26 的物理 SPARC 和虚拟 x64 (VMWare)。
以下是一些演示该问题的测试代码:
import java.io.IOException;
import java.net.InetSocketAddress;
import java.nio.ByteBuffer;
import java.nio.channels.SelectionKey;
import java.nio.channels.Selector;
import java.nio.channels.SocketChannel;
import java.util.HashSet;
import java.util.Iterator;
import java.util.Set;
public class NioTest3
{
public static void main(String[] args)
{
int i, tcount = 1, open = 0;
String[] addr = args[0].split(":");
int port = Integer.parseInt(addr[1]);
if (args.length == 2)
tcount = Integer.parseInt(args[1]);
InetSocketAddress inetaddr = new InetSocketAddress(addr[0], port);
try
{
Selector selector = Selector.open();
SocketChannel channel;
for (i = 0; i < tcount; i++)
{
channel = SocketChannel.open();
channel.configureBlocking(false);
channel.register(selector, SelectionKey.OP_CONNECT);
channel.connect(inetaddr);
}
open = tcount;
while (open > 0)
{
int selected = selector.select();
System.out.println("Selected=" + selected);
Iterator<SelectionKey> it = selector.selectedKeys().iterator();
while (it.hasNext())
{
SelectionKey key = it.next();
it.remove();
channel = (SocketChannel)key.channel();
if (key.isConnectable())
{
System.out.println("isConnectable");
if (channel.finishConnect())
{
System.out.println(formatAddr(channel) + " connected");
key.interestOps(SelectionKey.OP_WRITE);
}
}
else if (key.isWritable())
{
System.out.println(formatAddr(channel) + " isWritable");
String message = formatAddr(channel) + " the quick brown fox jumps over the lazy dog";
ByteBuffer buffer = ByteBuffer.wrap(message.getBytes());
channel.write(buffer);
key.interestOps(SelectionKey.OP_READ);
}
else if (key.isReadable())
{
System.out.println(formatAddr(channel) + " isReadable");
ByteBuffer buffer = ByteBuffer.allocate(1024);
channel.read(buffer);
buffer.flip();
byte[] bytes = new byte[buffer.remaining()];
buffer.get(bytes);
String message = new String(bytes);
System.out.println(formatAddr(channel) + " read: '" + message + "'");
channel.close();
open--;
}
}
}
}
catch (IOException e)
{
e.printStackTrace();
}
}
static String formatAddr(SocketChannel channel)
{
return Integer.toString(channel.socket().getLocalPort());
}
}
您可以使用命令行运行此代码:
java -cp . NioTest3 <ipaddr>:<port> <num-connections>
如果您正在运行真正的回显服务,则端口应为 7;即:
java -cp . NioTest3 127.0.0.1:7 5
如果您无法运行真正的 echo 服务,那么其来源是 此处。在 Solaris 下使用以下命令编译 echo 服务器:
$ cc -o echoserver echoserver.c -lsocket -lnsl
并按如下方式运行:
$ ./echoserver 8007 > out 2>&1 &
这已作为
I have a Java client that connects to a C++ server using TCP Sockets using Java NIO. This works under Linux, AIX and HP/UX but under Solaris the OP_CONNECT
event never fires.
Further details:
Selector.select()
is returning 0, and the 'selected key set' is empty.- The issue only occurs when connecting to the local machine (via loopback or ethernet interface), but works when connecting to a remote machine.
- I have confirmed the issue under two different Solaris 10 machines; a physical SPARC and virtual x64 (VMWare) using both JDK versions 1.6.0_21 and _26.
Here is some test code which demonstrates the issue:
import java.io.IOException;
import java.net.InetSocketAddress;
import java.nio.ByteBuffer;
import java.nio.channels.SelectionKey;
import java.nio.channels.Selector;
import java.nio.channels.SocketChannel;
import java.util.HashSet;
import java.util.Iterator;
import java.util.Set;
public class NioTest3
{
public static void main(String[] args)
{
int i, tcount = 1, open = 0;
String[] addr = args[0].split(":");
int port = Integer.parseInt(addr[1]);
if (args.length == 2)
tcount = Integer.parseInt(args[1]);
InetSocketAddress inetaddr = new InetSocketAddress(addr[0], port);
try
{
Selector selector = Selector.open();
SocketChannel channel;
for (i = 0; i < tcount; i++)
{
channel = SocketChannel.open();
channel.configureBlocking(false);
channel.register(selector, SelectionKey.OP_CONNECT);
channel.connect(inetaddr);
}
open = tcount;
while (open > 0)
{
int selected = selector.select();
System.out.println("Selected=" + selected);
Iterator<SelectionKey> it = selector.selectedKeys().iterator();
while (it.hasNext())
{
SelectionKey key = it.next();
it.remove();
channel = (SocketChannel)key.channel();
if (key.isConnectable())
{
System.out.println("isConnectable");
if (channel.finishConnect())
{
System.out.println(formatAddr(channel) + " connected");
key.interestOps(SelectionKey.OP_WRITE);
}
}
else if (key.isWritable())
{
System.out.println(formatAddr(channel) + " isWritable");
String message = formatAddr(channel) + " the quick brown fox jumps over the lazy dog";
ByteBuffer buffer = ByteBuffer.wrap(message.getBytes());
channel.write(buffer);
key.interestOps(SelectionKey.OP_READ);
}
else if (key.isReadable())
{
System.out.println(formatAddr(channel) + " isReadable");
ByteBuffer buffer = ByteBuffer.allocate(1024);
channel.read(buffer);
buffer.flip();
byte[] bytes = new byte[buffer.remaining()];
buffer.get(bytes);
String message = new String(bytes);
System.out.println(formatAddr(channel) + " read: '" + message + "'");
channel.close();
open--;
}
}
}
}
catch (IOException e)
{
e.printStackTrace();
}
}
static String formatAddr(SocketChannel channel)
{
return Integer.toString(channel.socket().getLocalPort());
}
}
You can run this using the command line:
java -cp . NioTest3 <ipaddr>:<port> <num-connections>
Where port should be 7 if you are running against a real echo service; i.e.:
java -cp . NioTest3 127.0.0.1:7 5
If you cannot get a real echo service running then the source to one is here. Compile the echo server under Solaris with:
$ cc -o echoserver echoserver.c -lsocket -lnsl
and run it like this:
$ ./echoserver 8007 > out 2>&1 &
This has been reported to Sun as a bug.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
您的错误报告已被关闭为“不是错误”,并附有解释。您忽略了
connect()
的结果,如果为 true,则意味着OP_CONNECT
永远不会触发,因为通道已连接。如果返回 false,您只需要整个OP_CONNECT/finishConnect()
megillah。因此,除非connect()
返回 false,否则您甚至不应该注册 OP_CONNECT,更不用说在调用connect() 之前注册它了。
进一步说明:
在底层,
OP_CONNECT
和OP_WRITE
是同一件事,这解释了它的一部分。由于您有一个线程用于此操作,解决方法是在阻塞模式下进行连接,然后切换到 I/O 非阻塞模式。
您是否在使用
选择器
注册通道之后执行 select() ?处理非阻塞连接的正确方法如下:
检查扩展代码后的一些非详尽注释:
关闭通道会取消密钥。您不需要两者都做。
非静态
removeInterest()
方法未正确实现。TYPE_DEREGISTER_OBJECT
也会关闭通道。不确定这是否是您真正想要的。我本以为它应该只是取消密钥,并且应该有一个单独的操作来关闭通道。您在小方法和异常处理方面做得太过分了。
addInterest()
和removeInterest()
就是很好的例子。他们捕获异常,记录它们,然后就像异常没有发生一样继续进行,而实际上他们所做的只是设置或清除一点:一行代码。最重要的是,其中许多都有静态和非静态版本。对于所有调用 key.cancel()、channel.close() 等的小方法也是如此。这一切都没有意义,它只是在计时代码行。它只会增加晦涩难懂并使您的代码更难理解。只需内联执行所需的操作,并在选择循环的底部有一个捕获器。如果
finishConnect()
返回 false,则不是连接失败,只是尚未完成。如果它抛出异常
,则表示连接失败。您同时注册
OP_CONNECT
和OP_READ
。这没有意义,而且可能会引起问题。在OP_CONNECT
触发之前没有任何内容可读取。只需首先注册OP_CONNECT
即可。您正在为每次读取分配一个
ByteBuffer
。这是非常浪费的。在连接的整个生命周期内使用相同的连接。您忽略了
read()
的结果。它可以为零。它可以是-1,表示EOS,此时您必须关闭通道。您还假设您将在一次读取中获得完整的应用程序消息。你不能这样假设。这是您应该在连接生命周期内使用单个ByteBuffer
的另一个原因。您忽略了
write()
的结果。它可能小于您调用它时的 buffer.remaining() 。它可以为零。您可以通过将
NetSelectable
作为关键附件来简化这一过程。然后,您可以删除一些东西,包括通道映射和断言,因为密钥的通道必须始终等于密钥附件的通道。我肯定还会将
finishConnect()
代码移至NetSelector
,并让connectEvent()
只是成功/失败通知。你不想传播这种东西。对readEvent()
执行相同的操作,即在NetSelector
中执行读取操作,并使用由NetSelectable
提供的缓冲区,然后通知 <读取结果的code>NetSelectable:count或-1或例外。写入时也是如此:如果通道可写,则从 NetSelectable 中获取要写入的内容,将其写入 NetSelector 中,并通知结果。您可以让通知回调返回一些内容来指示下一步要做什么,例如关闭通道。
但实际上这比它需要的复杂五倍,并且你有这个错误的事实证明了这一点。简化你的头脑。
Your bug report has been closed as 'not a bug', with an explanation. You are ignoring the result of
connect()
, which if true means thatOP_CONNECT
will never fire, because the channel is already connected. You only need the wholeOP_CONNECT/finishConnect()
megillah if it returns false. So you shouldn't even registerOP_CONNECT
unlessconnect()
returns false, let alone register it before you've even calledconnect().
Further remarks:
Under the hood,
OP_CONNECT
andOP_WRITE
are the same thing, which explains part of it.As you have a single thread for this, the workaround would be to do the connect in blocking mode, then switch to non-blocking for the I/O.
Are you doing the select() after registering the channel with the
Selector
?The correct way of handling non-blocking connect is as follows:
A few non-exhaustive comments after reviewing your extended code:
Closing a channel cancels the key. You don't need to do both.
The non-static
removeInterest()
method is incorrectly implemented.TYPE_DEREGISTER_OBJECT
also closes the channel. Not sure if that is what you really intended. I would have thought it should just cancel the key, and there should be a separate operation for closing the channel.You have gone way overboard on the small methods and exception handling.
addInterest()
andremoveInterest()
are good examples. They catch exceptions, log them, then proceed as though the exception hadn't happened, when all they actually do is set or clear a bit: one line of code. And on top of that many of them have both static and non-static versions. Same goes for all the little methods that callkey.cancel()
,channel.close()
, etc. There is no point to all this, it is just clocking up lines of code. It only adds obscurity and makes your code harder to understand. Just do the operation required inline and have a single catcher at the bottom of the select loop.If
finishConnect()
returns false it isn't a connection failure, it just hasn't completed yet. If it throws anException
, that is a connection failure.You are registering for
OP_CONNECT
andOP_READ
at the same time. This doesn't make sense, and it may cause problems. There is nothing to read untilOP_CONNECT
has fired. Just register forOP_CONNECT
at first.You are allocating a
ByteBuffer
per read. This is very wasteful. Use the same one for the life of the connection.You are ignoring the result of
read()
. It can be zero. It can be -1, indicating EOS, on which you must close the channel. You are also assuming you will get an entire application message in a single read. You can't assume that. That's another reason why you should use a singleByteBuffer
for the life of the connection.You are ignoring the result of
write()
. It could be less thanbuffer.remaining(
) was when you called it. It can be zero.You could simplify this a lot by making the
NetSelectable
the key attachment. Then you could do away with several things, including for example the channel map, and the assert, because the channel of the key must always be equal to the channel of the attachment of the key.I would also definitely move the
finishConnect()
code to theNetSelector
, and haveconnectEvent()
just be a success/failure notification. You don't want to spread this kind of stuff around. Do the same withreadEvent()
, i.e. do the read itself inNetSelector
, with a buffer supplied by theNetSelectable
, and just notify theNetSelectable
of the read result: count or -1 orexception. Ditto on write: if the channel is writable, get something to write from the
NetSelectable
, write it in the NetSelector, and notify the result. You could have the notification callbacks return something to indicate what to do next, e.g. close the channel.But really this is all five times as complex as it needs to be, and the fact that you have this bug proves it. Simplify your head.
我已经使用以下方法解决了这个错误:
如果
Selector.select()
返回 0 (并且没有超时,如果使用了超时版本)则:selector.keys().iterator()
(记住不调用iterator.remove()
)。真
。例如:
已将此问题作为 bug 报告给 Sun。
I have worked-around this bug using the following:
If
Selector.select()
returns 0 (and didn't timeout, if the timeout version was used) then:selector.keys().iterator()
(remembering not to calliterator.remove()
).OP_CONNECT
interest has been set with the key then callchannel.finishConnect()
and do whatever would have been done ifisConnectable()
has returnedtrue
.For example:
This has been reported to Sun as a bug.