Weblogic EJB 调用在中等负载下开始失败并出现OptionalDataException

发布于 2024-08-24 22:35:18 字数 4332 浏览 11 评论 0原文

我们的系统设置由两台 Weblogic 10.3 服务器组成:一台托管表示层,另一台托管 EJB。系统在中等负载下运行良好一段时间(一到几天),之后从表示服务器到 EJB 服务器的 EJB 方法调用开始失败,并出现以下错误:

java.rmi.RemoteException: java.rmi.UnmarshalException: error unmarshalling arguments; nested exception is: java.io.OptionalDataException

堆栈跟踪:

java.io.OptionalDataException
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1349)
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:351)
    at weblogic.utils.io.ChunkedObjectInputStream.readObject(ChunkedObjectInputStream.java:197)
    at weblogic.rjvm.MsgAbbrevInputStream.readObject(MsgAbbrevInputStream.java:564)
    at weblogic.utils.io.ChunkedObjectInputStream.readObject(ChunkedObjectInputStream.java:193)
    at weblogic.jndi.internal.RootNamingNode_WLSkel.invoke(Unknown Source)
    at weblogic.rmi.internal.BasicServerRef.invoke(BasicServerRef.java:589)
    at weblogic.rmi.cluster.ClusterableServerRef.invoke(ClusterableServerRef.java:230)
    at weblogic.rmi.internal.BasicServerRef$1.run(BasicServerRef.java:477)
    at weblogic.security.acl.internal.AuthenticatedSubject.doAs(AuthenticatedSubject.java:363)
    at weblogic.security.service.SecurityManager.runAs(Unknown Source)
    at weblogic.rmi.internal.BasicServerRef.handleRequest(BasicServerRef.java:473)
    at weblogic.rmi.internal.wls.WLSExecuteRequest.run(WLSExecuteRequest.java:118)

一旦遇到第一个OptionalDataException,所有后续调用都会失败得到相同的结果。一些消息来源表明,这可能与集群多播端口配置错误有关。但是,这些服务器不属于集群。

启动 EJB 服务器总是可以暂时解决该问题,但该问题似乎在一段时间后再次出现。

更新:看来问题毕竟与套接字连接数溢出无关(请参阅下面我自己的答案)。在禁止网络类加载之后,我们运行得非常稳定一周,之后我们又开始在演示服务器上收到OptionalDataExceptions(堆栈跟踪如下)。很奇怪的是,系统正常工作一周后就开始出现故障。

javax.naming.CommunicationException [Root exception is java.rmi.UnmarshalException: error unmarshalling arguments; nested exception is:
    java.io.OptionalDataException]
    at weblogic.jndi.internal.ExceptionTranslator.toNamingException(ExceptionTranslator.java:74)
    at weblogic.jndi.internal.WLContextImpl.translateException(WLContextImpl.java:439)
    at weblogic.jndi.internal.WLContextImpl.lookup(WLContextImpl.java:395)
    at weblogic.jndi.internal.WLContextImpl.lookup(WLContextImpl.java:380)
    at javax.naming.InitialContext.lookup(InitialContext.java:392)
    ...
Caused by: java.rmi.UnmarshalException: error unmarshalling arguments; nested exception is:

    java.io.OptionalDataException
    at weblogic.rjvm.ResponseImpl.unmarshalReturn(ResponseImpl.java:234)
    at weblogic.rmi.cluster.ClusterableRemoteRef.invoke(ClusterableRemoteRef.java:348)
    at weblogic.rmi.cluster.ClusterableRemoteRef.invoke(ClusterableRemoteRef.java:259)
    at weblogic.jndi.internal.ServerNamingNode_1030_WLStub.lookup(Unknown Source)
    at weblogic.jndi.internal.WLContextImpl.lookup(WLContextImpl.java:392)  
    ... 38 more
Caused by: java.io.OptionalDataException
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1349)
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:351)
    at     
    weblogic.utils.io.ChunkedObjectInputStream.readObject(ChunkedObjectInputStream.java:197)
    at weblogic.rjvm.MsgAbbrevInputStream.readObject(MsgAbbrevInputStream.java:564)
    at     
weblogic.utils.io.ChunkedObjectInputStream.readObject(ChunkedObjectInputStream.java:193)
    at weblogic.jndi.internal.RootNamingNode_WLSkel.invoke(Unknown Source)
    at weblogic.rmi.internal.BasicServerRef.invoke(BasicServerRef.java:589)
    at weblogic.rmi.cluster.ClusterableServerRef.invoke(ClusterableServerRef.java:230)
    at weblogic.rmi.internal.BasicServerRef$1.run(BasicServerRef.java:477)
    at        
weblogic.security.acl.internal.AuthenticatedSubject.doAs(AuthenticatedSubject.java:363)
    at weblogic.security.service.SecurityManager.runAs(Unknown Source)
    at weblogic.rmi.internal.BasicServerRef.handleRequest(BasicServerRef.java:473)
    at weblogic.rmi.internal.wls.WLSExecuteRequest.run(WLSExecuteRequest.java:118)
    ... 2 more

我们以非常标准的方式获取初始上下文:

Properties p = new Properties();
p.put(Context.INITIAL_CONTEXT_FACTORY, "weblogic.jndi.WLInitialContextFactory");
p.put(Context.PROVIDER_URL, serverPath);
Context context = new InitialContext(p);

对任何获取​​的引用的调用也会失败,并出现类似的OptionalDataException。单独启动演示服务器可以暂时解决该问题。

Our system setup consists of two Weblogic 10.3 servers: one hosts the presentation layer and the other hosts the EJBs. The system runs fine under moderate load for some time (one to several days) after which EJB method calls from the presentation server to the EJB server start to fail with the following error:

java.rmi.RemoteException: java.rmi.UnmarshalException: error unmarshalling arguments; nested exception is: java.io.OptionalDataException

Stack trace:

java.io.OptionalDataException
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1349)
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:351)
    at weblogic.utils.io.ChunkedObjectInputStream.readObject(ChunkedObjectInputStream.java:197)
    at weblogic.rjvm.MsgAbbrevInputStream.readObject(MsgAbbrevInputStream.java:564)
    at weblogic.utils.io.ChunkedObjectInputStream.readObject(ChunkedObjectInputStream.java:193)
    at weblogic.jndi.internal.RootNamingNode_WLSkel.invoke(Unknown Source)
    at weblogic.rmi.internal.BasicServerRef.invoke(BasicServerRef.java:589)
    at weblogic.rmi.cluster.ClusterableServerRef.invoke(ClusterableServerRef.java:230)
    at weblogic.rmi.internal.BasicServerRef$1.run(BasicServerRef.java:477)
    at weblogic.security.acl.internal.AuthenticatedSubject.doAs(AuthenticatedSubject.java:363)
    at weblogic.security.service.SecurityManager.runAs(Unknown Source)
    at weblogic.rmi.internal.BasicServerRef.handleRequest(BasicServerRef.java:473)
    at weblogic.rmi.internal.wls.WLSExecuteRequest.run(WLSExecuteRequest.java:118)

Once the first OptionalDataException is encountered all subsequent calls fail with the same result. Some sources suggest that this might be related to cluster multicast port being misconfigured. However, these servers do not belong to a cluster.

Booting the EJB server always temporarily resolves the issue, but the issue seems to occur again after some time.

Update: it seems that the problem is not related to an overflow in the number of socket connections after all (see my own answer below). After disallowing network classloading we ran very steadily for a week after which we started receiving OptionalDataExceptions on the presentation server again (stack trace below). It is very strange that the system works fine for a week and then starts to fail.

javax.naming.CommunicationException [Root exception is java.rmi.UnmarshalException: error unmarshalling arguments; nested exception is:
    java.io.OptionalDataException]
    at weblogic.jndi.internal.ExceptionTranslator.toNamingException(ExceptionTranslator.java:74)
    at weblogic.jndi.internal.WLContextImpl.translateException(WLContextImpl.java:439)
    at weblogic.jndi.internal.WLContextImpl.lookup(WLContextImpl.java:395)
    at weblogic.jndi.internal.WLContextImpl.lookup(WLContextImpl.java:380)
    at javax.naming.InitialContext.lookup(InitialContext.java:392)
    ...
Caused by: java.rmi.UnmarshalException: error unmarshalling arguments; nested exception is:

    java.io.OptionalDataException
    at weblogic.rjvm.ResponseImpl.unmarshalReturn(ResponseImpl.java:234)
    at weblogic.rmi.cluster.ClusterableRemoteRef.invoke(ClusterableRemoteRef.java:348)
    at weblogic.rmi.cluster.ClusterableRemoteRef.invoke(ClusterableRemoteRef.java:259)
    at weblogic.jndi.internal.ServerNamingNode_1030_WLStub.lookup(Unknown Source)
    at weblogic.jndi.internal.WLContextImpl.lookup(WLContextImpl.java:392)  
    ... 38 more
Caused by: java.io.OptionalDataException
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1349)
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:351)
    at     
    weblogic.utils.io.ChunkedObjectInputStream.readObject(ChunkedObjectInputStream.java:197)
    at weblogic.rjvm.MsgAbbrevInputStream.readObject(MsgAbbrevInputStream.java:564)
    at     
weblogic.utils.io.ChunkedObjectInputStream.readObject(ChunkedObjectInputStream.java:193)
    at weblogic.jndi.internal.RootNamingNode_WLSkel.invoke(Unknown Source)
    at weblogic.rmi.internal.BasicServerRef.invoke(BasicServerRef.java:589)
    at weblogic.rmi.cluster.ClusterableServerRef.invoke(ClusterableServerRef.java:230)
    at weblogic.rmi.internal.BasicServerRef$1.run(BasicServerRef.java:477)
    at        
weblogic.security.acl.internal.AuthenticatedSubject.doAs(AuthenticatedSubject.java:363)
    at weblogic.security.service.SecurityManager.runAs(Unknown Source)
    at weblogic.rmi.internal.BasicServerRef.handleRequest(BasicServerRef.java:473)
    at weblogic.rmi.internal.wls.WLSExecuteRequest.run(WLSExecuteRequest.java:118)
    ... 2 more

We obtain the initial context quite the standard way:

Properties p = new Properties();
p.put(Context.INITIAL_CONTEXT_FACTORY, "weblogic.jndi.WLInitialContextFactory");
p.put(Context.PROVIDER_URL, serverPath);
Context context = new InitialContext(p);

Also calls to any obtained references fail with a similar OptionalDataException. Booting the presentation server alone resolves the issue temporarily.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

宛菡 2024-08-31 22:35:18

最后,OptionalDataExceptions 已成为历史。简而言之,在我们的应用程序代码中,复杂值对象(用作远程方法调用的返回值)具有 HashMap 数据结构作为内部字段。将此字段的类型更改为 SynchronizedMap 后,OptionalDataExceptions 不再发生。似乎在遗留代码中的某个地方,这个 Map 是以非线程安全的方式处理的。

奇怪的是,这对 WLS 8.1 没有造成任何问题,但不知何故导致 WLS 10 进入所有后续远程方法调用(包括 JNDI 查找)开始失败的状态。

Finally the OptionalDataExceptions are history. In short, in our application code a complex value object (used as a return value for remote method invocations) had a HashMap datastructure as an internal field. After changing the type of this field to SynchronizedMap the OptionalDataExceptions stopped occurring. It seems that somewhere in the legacy code this Map is handled in non thread-safe way.

What is strange is that this caused no problems with WLS 8.1, but somehow caused WLS 10 enter a state where all subsequent remote method invocations (including JNDI lookups) started to fail.

南巷近海 2024-08-31 22:35:18

最后我们找到了解决方案(编辑:后来我们发现这不是问题的根本原因,而是一个单独的严重问题。最终的解决方案请参阅下面的答案)。一旦我们开始收到以下异常,我们就开始跟踪原因:

<BEA-000403> <IOException occurred on socket: Socket[addr=/x.x.x.x,port=3266,localport=7001]
 java.net.SocketException: Connection refused.
java.net.SocketException: Connection refused
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:129)
at weblogic.socket.SocketMuxer.readReadySocketOnce(SocketMuxer.java:887)
at weblogic.socket.SocketMuxer.readReadySocket(SocketMuxer.java:859)
at weblogic.socket.DevPollSocketMuxer.processSockets(DevPollSocketMuxer.java:120)
at weblogic.socket.SocketReaderRequest.run(SocketReaderRequest.java:29)
at weblogic.socket.SocketReaderRequest.execute(SocketReaderRequest.java:42)
at weblogic.kernel.ExecuteThread.execute(ExecuteThread.java:145)
at weblogic.kernel.ExecuteThread.run(ExecuteThread.java:117)

在与 EJB 服务器运行在不同主机上的表示服务器上,我们可以选择

-Dweblogic.NetworkClassLoadingEnabled=true

明显启用从 EJB 服务器加载类。我们不知道的是,使用此选项可能会导致打开大量网络套接字。使用 netstat 我们能够发现数千个套接字处于 CLOSE_WAIT 或 FIN_WAIT_2 状态。看起来,除了类之外,Web UI 中的所有元素都是从 EJB 服务器加载的,尽管演示服务器上的 war 文件包含所有这些元素。由于 Weblogic 在其启动脚本中删除了文件的 ulimit,因此大量的套接字并没有导致“文件过多”错误消息。使用测试服务器,我们发现用户在 Web UI 上单击一下即可在两台服务器之间打开大约 30 个套接字。

我们删除了此选项并在演示服务器上重新打包了 war 以包含所有需要的类,从而消除了对网络类加载的需要。这导致两台服务器之间的套接字连接数从数千减少到 1。

总而言之,如果可能的话,请避免在 Weblogic 中加载网络类。

Finally we found the solution to this (Edit: later we found out that this was not the root cause of the issue, but a separate serious issue. For the final solution, please see the answer below). Once we started to receive the following exception we got on the tracks of the cause:

<BEA-000403> <IOException occurred on socket: Socket[addr=/x.x.x.x,port=3266,localport=7001]
 java.net.SocketException: Connection refused.
java.net.SocketException: Connection refused
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:129)
at weblogic.socket.SocketMuxer.readReadySocketOnce(SocketMuxer.java:887)
at weblogic.socket.SocketMuxer.readReadySocket(SocketMuxer.java:859)
at weblogic.socket.DevPollSocketMuxer.processSockets(DevPollSocketMuxer.java:120)
at weblogic.socket.SocketReaderRequest.run(SocketReaderRequest.java:29)
at weblogic.socket.SocketReaderRequest.execute(SocketReaderRequest.java:42)
at weblogic.kernel.ExecuteThread.execute(ExecuteThread.java:145)
at weblogic.kernel.ExecuteThread.run(ExecuteThread.java:117)

On the presentation server, which is running on a different host than the EJB server we had the option

-Dweblogic.NetworkClassLoadingEnabled=true

to obviously enable class loading from the EJB server. What we did not know is that using this option can result in a huge number of network sockets being opened. Using netstat we were able to find out that several thousand sockets were either in CLOSE_WAIT or FIN_WAIT_2 state. It seems that all the elements in the web UI were loaded from the EJB server in addition to the classes despite the fact that the war file on the presentation server contained all these. The huge amount of sockets did not result in "too many files" error messages since Weblogic removes the ulimit for files in its startup script. Using a test server we found out that a single click on the web UI by the user opened around 30 sockets between the two servers.

We removed this option and repackaged the war on the presentation server to contain all the needed classes thus removing the need for network classloading. This resulted in a decrease in the number of socket connections between the two servers from thousands to 1.

In a summary, avoid network class loading in Weblogic if at all possible.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文