如何检测 Oracle 连接中断/停滞?
在我们的服务器/客户端设置中,我们遇到了一些奇怪的行为。客户端是一个 C/C++ 应用程序,它使用 OCI 连接到 Oracle 服务器(使用 OTL 库) 。
数据库服务器时不时地会以某种方式死亡(是的,这是核心问题,但从应用程序端来看,我们无法解决它,但无论如何都必须处理它),机器不再响应新请求/connections 但现有的连接(例如 Oracle 连接)不会丢失或超时。发送到数据库的查询永远不会再成功返回。
Oracle 提供了哪些可能性(如果有)来从客户端应用程序端检测这些停滞的连接并以或多或少安全的方式恢复?
In our server/client-setup we're experiencing some weird behaviour. The client is a C/C++-application which uses OCI to connect to an Oracle server (using the OTL library).
Every now and then the DB server dies in a way (yes this is the core issue, but from application-side we're unable to solve it but have to deal with it anyway), that the machine does not respond anymore to new requests/connections but the existing ones, like the Oracle-connections, do not drop or time out. Queries sent to the DB just never return successfully anymore.
What possibilities (if any) are provided by Oracle to detect these stalled connections from the client-application side and recover in a more or less safe way?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
这是 Oracle 11.1.0.6 之前的一个错误(或称为功能),他们说 Oracle 11g 第 1 版上的补丁(补丁 11.1.0.7)已修复。需要看看。
如果发生这种情况,您将必须取消(杀死)执行此操作的线程。
虽然不是好方法
This is a bug in Oracle ( or call it a feature ) till 11.1.0.6 and they said the patch on Oracle 11g release 1 ( patch 11.1.0.7 ) which has the fix. Need to see that.
If it happens you will have to cancel ( kill ) the thread performing this action.
Not good approach though
在我的所有数据库模式中,我有一个包含一个常量记录的表。只需通过简单的 SQL 请求定期轮询此类表即可。所有其他方法都不可靠。
In all my DB schema i have a table with one constant record. Just poll such table periodically by simple SQL request. All other methods unreliable.
OTL 中有一个
set_timeout
API 对此可能有用。编辑:实际上,忽略这一点。
set_timeout
不适用于 OCI。查看 此处 中的set_timeout
描述,其中描述了可与 OCI 一起使用的技术There's a
set_timeout
API in OTL that might be useful for this.Edit: Actually, ignore that.
set_timeout
doesn't work with OCI. Have a look at theset_timeout
description from here where it describes a technique that can be used with OCI听起来您需要向数据库发出查询(例如 SELECT * FROM Dual;),然后如果数据库在指定的时间内没有响应,则假设服务器已死亡并做出反应因此。恐怕我不懂 C/C++,但是您可以使用多线程来触发语句然后等待响应,而不挂起应用程序吗?
Sounds like you need to fire off a query to the database (eg
SELECT * FROM dual;
), then if the database hasn't responded within a specified amount of time, assume the server has died and react accordingly. I'm afraid I don't know C/C++, but can you use multi-threading to fire off the statement then wait for the response, without hanging the application?这有效 - 我已经完成了您正在寻找的事情。
让父进程 (A) 创建子进程 (B)。子进程(B)连接到数据库,
执行查询(类似于“select 1 from a_table” - 如果您避免使用“dual”并创建自己的表,您将获得更好的性能)。如果 (B) 成功,则它会写出成功并退出。 (A) 正在等待指定的时间。我用了15秒。如果 (A) 检测到 (B) 仍在运行 - 那么它可以假设数据库挂起 - 它会杀死 (B) 并采取必要的操作(例如用短信给我打电话)。
如果您将 SQL*NET 配置为使用超时,您可能会注意到大型查询将因此而失败。 OCI set_timeout 配置也会导致这种情况。
This works - I have done exactly what you are looking for.
Have a parent process (A) create a child process (B). The child process (B) connects to the database,
performs a query (something like "select 1 from a_table" - you will get better performance if you avoid using "dual" for this and create your own table). If (B) is successful then it writes out that it was successful and exits. (A) is waiting for a specified amount of time. I used 15 seconds. If (A) detects that (B) is still running - then it can assume that the database is hung - it Kills (B) and takes necessary actions (Like calling me on the phone with a SMS).
If you configure SQL*NET to use a timeout you will probably notice that large queries will fail because of it. The OCI set_timeout configuration will also cause this.
有一种手动方法可以避免这种情况。您可以打开防火墙并在每个指定的时间段后执行诸如 ping 数据库之类的操作。这样数据库连接就不会丢失。
主意
There is a manual way to avoid this. You can open a firewall and do something like ping database after every specified duration of time. In this way the database connection will not get lost.
idea