SQL Server 警报 - 最佳实践
您总是为每个数据库设置哪些 SQL Server 警报? 无论数据库如何,您始终监控什么?
What SQL Server Alerts do you always setup for every database? What do you always monitor regardless of the database?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
您应该监视严重级别 17 到 25 并收到警报。
17 到 19 的严重级别将需要 DBA 的干预,它们不像 20-25 那样严重,但需要向 DBA 发出警报。
17 资源不足
18 检测到非致命内部错误
19 资源错误
这些是严重错误,意味着 SQL Server 不再工作
20 当前进程中的 SQL 错误
数据库 dbid 进程中的 21 SQL 致命错误
22 SQL 致命错误表完整性可疑
23 SQL 致命错误:数据库完整性可疑
24,25 硬件错误
有关严重级别的详细信息,请参阅 http://msdn.microsoft.com/en-us/library/aa937483(SQL.80).aspx
You should monitor and be alerted for severity levels 17 to 25.
Severity levels from 17 through 19 will require intervention from a DBA, they're not as serious as 20-25 but the DBA needs to be alerted.
17 Insufficient Resources
18 Nonfatal Internal Error Detected
19 Error in Resource
These are serious errors that will mean SQL Server is no longer working
20 SQL Error in Current Process
21 SQL Fatal Error in Database dbid Processes
22 SQL Fatal Error Table Integrity Suspect
23 SQL Fatal Error: Database Integrity Suspect
24,25 Hardware Error
for more information on the severity levels see http://msdn.microsoft.com/en-us/library/aa937483(SQL.80).aspx
我还会添加有关错误 823、824 和 832 的警报,因为这些错误表明存在损坏。
有关详细信息,请参阅 http://www.sqlservercentral.com/articles/Memory+Corruption/ 93424/ 和 http://www.sqlskills.com/BLOGS/PAUL/post/Dont-confuse-error-823-and-error-832.aspx
I would also add alerts on Error 823, 824 and 832, since these errors indicate corruption.
For more information see http://www.sqlservercentral.com/articles/Memory+Corruption/93424/ and http://www.sqlskills.com/BLOGS/PAUL/post/Dont-confuse-error-823-and-error-832.aspx
除了日志警报之外,我们还始终为所有服务器打开硬件警报。 例如,inode 错误等硬件错误可以像 5xx 错误一样快速地导致服务器瘫痪。 我们发现,当服务器上的代码无法删除旧导出时,客户的 PDF 导出功能会失败,从而填满磁盘空间,直到导出完全失败。 定期日志警报不会警告您这些事情,直到为时已晚。 但监控磁盘空间就有了。
不幸的是,日志管理解决方案不会自动为您设置这些警报,因此有时您会发现自己需要这些警报,但很困难:当您已经遇到问题时。
我们撰写了一篇博客文章,介绍了为什么将硬件指标警报与标准日志警报配对很重要:https://blog.bluematador.com/posts/how-essential-alerts-could-have-saved-the-millennium-falcon/
We always turn on hardware alerts for all of our servers, in addition to log alerts. Hardware errors, such as inode errors, can take down servers just as fast as 5xx errors, for example. We've seen customers' PDF export capabilities fail when code on a server failed to delete old exports, filling up the disk space until exports failed altogether. Regular log alerts won't warn you of these things until it's too late. But monitoring the disk space would have.
Unfortunately, log management solutions don't set these alerts up for you automatically, so sometimes you find out you needed the alerts the hard way: when you've already got a problem.
We wrote a blog post about why it's important to pair hardware metric alerting with standard log alerts: https://blog.bluematador.com/posts/how-essential-alerts-could-have-saved-the-millennium-falcon/