如何解决我的 Azure 角色中与 SQL Azure 的连接突然丢失的问题?
我的 Azure 角色从数据库获取要处理的内容 - 它拥有一个 System.Data.SqlClient.SqlConnection 实例
并定期创建一个 SqlCommand
实例并执行 SQL 查询。
现在偶尔(通常是几天一次)运行查询会触发 SqlException
异常
该服务在处理您的请求时遇到错误。请再试一次。错误代码 40143。 当前命令发生严重错误。结果(如果有)应被丢弃。
我已经见过很多次了,现在我的代码捕获了它,在 SqlConnection
实例上调用 Dispose()
,然后重新打开连接并重试查询。后者通常会导致另一个 SqlException
异常
超时已过。操作完成之前超时时间已过,或者服务器没有响应。
这看起来很像 SQL Azure 服务器没有响应或由于某种原因不可用。
目前我的代码没有捕获后一个异常,它会在 RoleEntryPoint.Run()
之外传播,并且角色会重新启动。重新启动通常需要大约十分钟,完成后问题就会消失一天左右。
我不喜欢我的角色重新启动 - 这需要一段时间并且我的服务功能受到阻碍。我想做一些更聪明的事情。
解决这个问题的策略是什么?我应该重试查询几次吗?重试次数和间隔是多少?我应该做点别的吗?我什么时候放弃并让角色重新开始?
My Azure role grabs stuff to process from a database - it holds an instance of System.Data.SqlClient.SqlConnection
and periodically creates an SqlCommand
instance and executes an SQL query.
Now once in a while (usually once in several days) running a query will trigger an SqlException
exception
The service has encountered an error processing your request. Please try again. Error code 40143.
A severe error occurred on the current command. The results, if any, should be discarded.
Which I've already seen many times and now my code catches it, calls Dispose()
on the SqlConnection
instance and then reopens the connection and retries the query. The latter typically results in another SqlException
exception
Timeout expired. The timeout period elapsed prior to completion of the operation or the server is not responding.
Which looks pretty much like SQL Azure server not responding or being unavailable for whatever reason.
Currently my code doesn't catch the latter exception, it is propagated outside RoleEntryPoint.Run()
and the role is restarted. Restart typically takes about ten minutes and once it completes the problem is gone for a day or so.
I don't like my role restarting - it's takes a while and my service functionality is hindered. I'd like to do something smarter.
What would be a strategy to address this problem? Should I retry the query several times and how many times and with what interval? Should I do something else? When do I give up and let the role just restart?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
我强烈建议您查看 SQL Azure 的瞬态故障处理框架
这将帮助您处理连接和查询尝试的重试逻辑,我在生产中使用它并且效果很好。 technet 上还有一篇不错的文章 technet 这可能会有一些用处。
[编辑:2013 年 10 月 17 日]
看起来这已被模式和实践团队采纳,位于 瞬态故障处理应用程序块
I would strongly recommend you have a look at the Transient Fault Handling Framework for SQL Azure
This will help you handle retry logic for both connection and query attempts, I am using this in production and it works great. There is also a nice article on technet that might be of some use.
[EDIT: 17 Oct 2013]
It looks like this has been picked up by the patterns and practices team at The Transient Fault Handling Application Block
我们使用 TransientFaultHandling,它不能处理所有奇怪的异常。
例如,昨天弹出了这样一个消息:
服务在处理您的请求时遇到错误。请再试一次。错误代码 40143。
当前命令发生严重错误。如果有结果,则应丢弃。 ,System.Data.SqlClient.SqlConnection.OnError(SqlException异常,布尔breakConnection)处的堆栈跟踪
在 System.Data.SqlClient.TdsParser.ThrowExceptionAndWarning()
在 System.Data.SqlClient.TdsParser.Run(RunBehavior runBehavior, SqlCommand cmdHandler, ...
即使如此,合理的方法也适用:
典型工作流示例:
”的 SQL Azure 调用期间发生问题,只需退出而不删除消息,它会在可见性超时到期后再次弹出。
如果在“无害 是非常常见的方法:组织成类似事务的块,将块包装到 try-catch 中,在异常时巧妙地回滚,并且永远不要假设某些调用不会失败。
We use TransientFaultHandling and it doesn't handle all of the strange exceptions.
For example, this one popped up yesterday:
The service has encountered an error processing your request. Please try again. Error code 40143.
A severe error occurred on the current command. The results, if any, should be discarded. , stacktrace at System.Data.SqlClient.SqlConnection.OnError(SqlException exception, Boolean breakConnection)
at System.Data.SqlClient.TdsParser.ThrowExceptionAndWarning()
at System.Data.SqlClient.TdsParser.Run(RunBehavior runBehavior, SqlCommand cmdHandler, . . .
The reasonable approach that will work even with this:
Example of a typical workflow:
Wrap B through C together in a try-catch. If something happens during 'harmless' SQL Azure call, simply bail out without deleting the message, it will simply pop up again after visibility timeout expires.
Actually, this is very common approach: organize into transaction-like blocks, wrap block into try-catch, neatly roll back on exception. And never, never assume that some calls do not fail. All call fail from time to time.