帮助解决 SqlException:在非负载情况下连接超时

发布于 2024-08-04 06:27:46 字数 6039 浏览 11 评论 0原文

我有一个托管我的网站的服务器,该网站的流量几乎为零。
每天都有少数人 (< 20) 进入该网站,并且有一些 RSS 读者订阅了我们发布的一些提要。

几乎每天晚上,RSS 阅读器都会在半夜来找我们,并得到一个异常,即由于连接超时,网站无法连接到 SQL Server。 细节非常奇怪,所以我正在寻找可能出现问题的帮助,因为我不知道从哪里开始寻找了。

我们在 Windows Server 2008 上使用 ASP.Net MVC、Entity Framework 和 SQL Server 2008。该机器是我们从不完全是顶级提供商处获得的专用设备,因此配置可能不是最佳的,或者谁知道是什么否则。
这个盒子也很小,只有 1Gb RAM,但它应该承受我们现在的负载...

我正在复制下面的完整调用堆栈,但首先,我们知道一些事情

  • :当 iTunes 查询我们的网站时,总会发生这种情况。我相信这应该与任何事情无关,但事实是我们只能从 iTunes 获得它。我最好的猜测是,发生这种情况是因为只有 iTunes 在晚上的那个时间询问我们,而没有其他人联系我们。
  • 我们的理论之一是,SQL Server 和 IIS 正在争夺内存,其中之一因未使用而被分页到磁盘,当有人“唤醒它”时,从磁盘读回所有内容需要很长时间进入记忆。这是有可能发生的事情吗? (我有点放弃这个,因为如果可能的话,这听起来像是 SQL Server 中的一个设计问题)
  • 我还考虑了我们泄漏连接的可能性,因为我们可能没有适当地处理 EF 实体(在此处查看我的问题)。这是我通过谷歌搜索这个问题唯一能找到的东西。鉴于我们的负载极低,我将放弃它。
  • 这种情况总是发生在晚上,因此很可能与一段时间没有发生任何事情有关。例如,我非常确定当这些请求命中时,Web 服务器进程会被回收,并且它会启动/重新 JITting 一切。不过,重新 JITting 并不能解释 SQL 超时。

更新:我们按照建议附加了一个探查器,花了很长时间才出现新的异常。这是我们所知道的新东西:

  • 附加分析器大大减少了我们遇到的错误数量。事实上,在通常每天收到几次之后,我们不得不等待 3 或 4 天才能发生一次。一旦我们停止分析器,它就会恢复到正常的错误频率(甚至更糟)。因此探查器具有在一定程度上隐藏此问题的一些效果,但并非完全隐藏。
  • 查看 IIS 请求日志旁边的探查器跟踪,请求和查询之间存在预期的 1-1 对应关系。然而,我时不时地看到正在执行的大量查询与 IIS 日志完全没有关联。事实上,就在记录实际错误之前,我在 3 分钟内收到了 750 个查询,所有这些查询都与 IIS 日志完全无关。查询文本看起来像是 EF 生成的那种难以阅读的废话,而且它们并不完全相同,它们看起来都像来自网站的查询:相同的应用程序名称、用户等。要知道这有多么荒谬也就是说,在2天内,该网站收到了大约 370 个访问数据库的 IIS 请求。
  • 这些无法解释的查询并非来自与之前网站相同的 ClientProcessID,尽管它们可能仍然来自网站(如果该进程同时被回收)。从最后一个解释的查询到第一个未解释的查询之间,几乎有一个小时没有任何活动。
  • 这些我不知道它们来自哪里的长串查询之一就在我记录错误之前出现,所以我相信这是我们应该遵循的线索。
  • 正如我最初预期的那样,当执行抛出错误的查询时,它来自与前一个不同的 ClientProcessID(比前一个无法解释的查询晚了 8 分钟,几乎比前一个 IIS 晚了 1 小时)。对我来说,这意味着工作进程确实被回收了。
  • 这是我绝对不能理解的事情。 IIS 日志显示,在错误请求发生前一分钟,有 4 个请求得到了完美服务,尽管对这些请求的查询根本没有显示在跟踪中。事实上,在这 4 个进展顺利之后,我连续快速抛出了 4 个异常,这 4 个异常也没有出现在跟踪中(这是有道理的,因为如果连接中有超时,则查询应该永远不会被执行,但我也没有在跟踪中看到连接尝试)

所以,简而言之,我对此完全一无所知。我找不到连续快速运行的数百个查询的原因,但我相信这些肯定与问题有关。
我也不知道如何诊断连接问题...
或者,Profiler 跟踪可能会丢失一些根据 IIS 正常执行的查询...

有什么想法吗?


这是例外信息:

System.Data.SqlClient.SqlException: Timeout expired. The timeout period elapsed prior to completion of the operation or the server is not responding.

System.Data.EntityException: The underlying provider failed on Open. ---> System.Data.SqlClient.SqlException: Timeout expired.  The timeout period elapsed prior to completion of the operation or the server is not responding.
   at System.Data.ProviderBase.DbConnectionPool.GetConnection(DbConnection owningObject)
   at System.Data.ProviderBase.DbConnectionFactory.GetConnection(DbConnection owningConnection)
   at System.Data.ProviderBase.DbConnectionClosed.OpenConnection(DbConnection outerConnection, DbConnectionFactory connectionFactory)
   at System.Data.ProviderBase.DbConnectionClosed.OpenConnection(DbConnection outerConnection, DbConnectionFactory connectionFactory)
   at System.Data.SqlClient.SqlConnection.Open()
   at System.Data.EntityClient.EntityConnection.OpenStoreConnectionIf(Boolean openCondition, DbConnection storeConnectionToOpen, DbConnection originalConnection, String exceptionCode, String attemptedOperation, Boolean& closeStoreConnectionOnFailure)
   at System.Data.EntityClient.EntityConnection.OpenStoreConnectionIf(Boolean openCondition, DbConnection storeConnectionToOpen, DbConnection originalConnection, String exceptionCode, String attemptedOperation, Boolean& closeStoreConnectionOnFailure)
   --- End of inner exception stack trace ---
   at System.Data.EntityClient.EntityConnection.OpenStoreConnectionIf(Boolean openCondition, DbConnection storeConnectionToOpen, DbConnection originalConnection, String exceptionCode, String attemptedOperation, Boolean& closeStoreConnectionOnFailure)
   at System.Data.EntityClient.EntityConnection.Open()
   at System.Data.Objects.ObjectContext.EnsureConnection()
   at System.Data.Objects.ObjectQuery`1.GetResults(Nullable`1 forMergeOption)
   at System.Data.Objects.ObjectQuery`1.System.Collections.Generic.IEnumerable<T>.GetEnumerator()
   at System.Linq.Enumerable.FirstOrDefault[TSource](IEnumerable`1 source)
   at System.Data.Objects.ELinq.ObjectQueryProvider.<GetElementFunction>b__1[TResult](IEnumerable`1 sequence)
   at System.Data.Objects.ELinq.ObjectQueryProvider.ExecuteSingle[TResult](IEnumerable`1 query, Expression queryRoot)
   at System.Data.Objects.ELinq.ObjectQueryProvider.System.Linq.IQueryProvider.Execute[S](Expression expression)
   at System.Linq.Queryable.FirstOrDefault[TSource](IQueryable`1 source)
   at MyProject.Controllers.SitesController.Feed(Int32 id) in C:\...\Controller.cs:line 38
   at lambda_method(ExecutionScope , ControllerBase , Object[] )
   at System.Web.Mvc.ReflectedActionDescriptor.Execute(ControllerContext controllerContext, IDictionary`2 parameters)
   at System.Web.Mvc.ControllerActionInvoker.InvokeActionMethod(ControllerContext controllerContext, ActionDescriptor actionDescriptor, IDictionary`2 parameters)
   at System.Web.Mvc.ControllerActionInvoker.<>c__DisplayClassa.<InvokeActionMethodWithFilters>b__7()
   at System.Web.Mvc.ControllerActionInvoker.InvokeActionMethodFilter(IActionFilter filter, ActionExecutingContext preContext, Func`1 continuation)
   at System.Web.Mvc.ControllerActionInvoker.InvokeActionMethodWithFilters(ControllerContext controllerContext, IList`1 filters, ActionDescriptor actionDescriptor, IDictionary`2 parameters)
   at System.Web.Mvc.ControllerActionInvoker.InvokeAction(ControllerContext controllerContext, String actionName)
   at System.Web.Mvc.Controller.ExecuteCore()
   at System.Web.Mvc.MvcHandler.ProcessRequest(HttpContextBase httpContext)
   at System.Web.HttpApplication.CallHandlerExecutionStep.System.Web.HttpApplication.IExecutionStep.Execute()
   at System.Web.HttpApplication.ExecuteStep(IExecutionStep step, Boolean& completedSynchronously)

任何想法都将受到极大的赞赏。

I have a server hosting a website of mine that has almost zero-traffic.
A few people (< 20) enter the site every day, and a few RSS readers are subscribed to some feeds we put out.

Almost every night, an RSS reader will hit us in the middle of the night and get an exception that the website can't connect to the SQL Server because of a Timeout in the connection.
The details are extremely weird, so I'm looking for some help on what could be the issue, since I don't know where to start looking anymore.

We are using ASP.Net MVC, Entity Framework, and SQL Server 2008 over Windows Server 2008. The machine is a dedicated box we got from a not exactly top-tier provider, so things might be configured non-optimally, or who knows what else.
The box is also pretty small, and has only 1Gb RAM, but it should take the kind of load we have for now...

I'm copying the full call stack below, but first, some of the things we know:

  • The error always happens when iTunes is querying our site. I believe this should have nothing to do with anything, but the truth is that we only get it from iTunes. My best guess is that this happens because only iTunes queries us at that time of the night when no one else is hitting us.
  • One of our theories is that the SQL Server and IIS are fighting for memory, and one of them is getting paged to disk out of not being used, and when someone "wakes it up", it takes too long to read everything from disk back into memory. Is this something that could potentially happen? (I'm kind of discarding this since it sounds like a design issue in SQL Server if it were possible)
  • I also thought about the possibility that we're leaking connections, since we may not be disposing of EF entities appropriately (see my question here). This is the only thing I could find by Googling the problem. I'm discarding this given the extremely low load we have.
  • This always happens over the night, so it's very likely something related to the fact that nothing happened for a while. For example, I'm pretty sure that when these requests hit, the web server process got recycled and it's starting up / re-JITting everything. The re-JITting doesn't explain the SQL timeout, though.

UPDATE: We attached a profiler as suggested, and it took quite a while before we had a new exception. This is the new stuff we know:

  • Having the profiler attached enormously reduced the number of errors we got. In fact, after normally getting several per day, we had to wait for 3 or 4 days for this to happen ONCE. Once we stopped the profiler, it went back to the normal error frequency (or even worse). So the profiler has some effect that hides this problem to some extent, but not completely.
  • Looking at the profiler trace next to the IIS requests log, there is an expected 1-1 correspondence between requests and queries. However, every now and then, I see A LOT of queries being executed that have no correllation at all with the IIS log. In fact, right before the actual bug was logged, I got 750 queries in a period of 3 minutes, all of which were completely unrelated to the IIS logs. The query text look like the kind of unreadable crap that EF generates, and they're not all the same, and they all look just like the queries coming from the website: Same ApplicationName, User, etc. To give an idea how ridiculous this is, the site got about 370 IIS requests that hit the DB, in the course of 2 days
  • These unexplained queries did not come from the same ClientProcessID as the previous website ones, although they may still have come from the website, if the process got recycled in the meantime. There was almost an hour of no activity between the last explained query, and the first unexplained one.
  • One of these long streaks of queries that I don't know where they came from came right before the error I got logged, so I believe this is the clue we should be following.
  • As I expected originally, when the query that threw the error was executed, it came from a different ClientProcessID than the previous one, (8 minutes later than the previous unexplained one, and almost exactly one hour later than the previous IIS one). This means, to me, that the worker process had indeed gotten recycled.
  • This is something I absolutely don't understand. The IIS log shows that one minute before the error requests, 4 were perfectly served, although the queries for those don't show up in the trace at all. In fact, after those 4 that went well, I had 4 exceptions thrown in quick succession, those 4 ALSO don't show up in the trace (which makes sense since if there was a Timeout in connection the query should have never gotten executed, but I don't see the connections attempts in the trace either)

So, in short, I'm completely clueless about this. I can't find a reason for those hundreds of queries that get run in quick succession, but I believe those must have something to do with the problem.
I also don't know how to diagnose the connection problems...
Or how the Profiler trace may be missing some queries that according to IIS went through fine...

Any ideas?


This is the exception information:

System.Data.SqlClient.SqlException: Timeout expired. The timeout period elapsed prior to completion of the operation or the server is not responding.

System.Data.EntityException: The underlying provider failed on Open. ---> System.Data.SqlClient.SqlException: Timeout expired.  The timeout period elapsed prior to completion of the operation or the server is not responding.
   at System.Data.ProviderBase.DbConnectionPool.GetConnection(DbConnection owningObject)
   at System.Data.ProviderBase.DbConnectionFactory.GetConnection(DbConnection owningConnection)
   at System.Data.ProviderBase.DbConnectionClosed.OpenConnection(DbConnection outerConnection, DbConnectionFactory connectionFactory)
   at System.Data.ProviderBase.DbConnectionClosed.OpenConnection(DbConnection outerConnection, DbConnectionFactory connectionFactory)
   at System.Data.SqlClient.SqlConnection.Open()
   at System.Data.EntityClient.EntityConnection.OpenStoreConnectionIf(Boolean openCondition, DbConnection storeConnectionToOpen, DbConnection originalConnection, String exceptionCode, String attemptedOperation, Boolean& closeStoreConnectionOnFailure)
   at System.Data.EntityClient.EntityConnection.OpenStoreConnectionIf(Boolean openCondition, DbConnection storeConnectionToOpen, DbConnection originalConnection, String exceptionCode, String attemptedOperation, Boolean& closeStoreConnectionOnFailure)
   --- End of inner exception stack trace ---
   at System.Data.EntityClient.EntityConnection.OpenStoreConnectionIf(Boolean openCondition, DbConnection storeConnectionToOpen, DbConnection originalConnection, String exceptionCode, String attemptedOperation, Boolean& closeStoreConnectionOnFailure)
   at System.Data.EntityClient.EntityConnection.Open()
   at System.Data.Objects.ObjectContext.EnsureConnection()
   at System.Data.Objects.ObjectQuery`1.GetResults(Nullable`1 forMergeOption)
   at System.Data.Objects.ObjectQuery`1.System.Collections.Generic.IEnumerable<T>.GetEnumerator()
   at System.Linq.Enumerable.FirstOrDefault[TSource](IEnumerable`1 source)
   at System.Data.Objects.ELinq.ObjectQueryProvider.<GetElementFunction>b__1[TResult](IEnumerable`1 sequence)
   at System.Data.Objects.ELinq.ObjectQueryProvider.ExecuteSingle[TResult](IEnumerable`1 query, Expression queryRoot)
   at System.Data.Objects.ELinq.ObjectQueryProvider.System.Linq.IQueryProvider.Execute[S](Expression expression)
   at System.Linq.Queryable.FirstOrDefault[TSource](IQueryable`1 source)
   at MyProject.Controllers.SitesController.Feed(Int32 id) in C:\...\Controller.cs:line 38
   at lambda_method(ExecutionScope , ControllerBase , Object[] )
   at System.Web.Mvc.ReflectedActionDescriptor.Execute(ControllerContext controllerContext, IDictionary`2 parameters)
   at System.Web.Mvc.ControllerActionInvoker.InvokeActionMethod(ControllerContext controllerContext, ActionDescriptor actionDescriptor, IDictionary`2 parameters)
   at System.Web.Mvc.ControllerActionInvoker.<>c__DisplayClassa.<InvokeActionMethodWithFilters>b__7()
   at System.Web.Mvc.ControllerActionInvoker.InvokeActionMethodFilter(IActionFilter filter, ActionExecutingContext preContext, Func`1 continuation)
   at System.Web.Mvc.ControllerActionInvoker.InvokeActionMethodWithFilters(ControllerContext controllerContext, IList`1 filters, ActionDescriptor actionDescriptor, IDictionary`2 parameters)
   at System.Web.Mvc.ControllerActionInvoker.InvokeAction(ControllerContext controllerContext, String actionName)
   at System.Web.Mvc.Controller.ExecuteCore()
   at System.Web.Mvc.MvcHandler.ProcessRequest(HttpContextBase httpContext)
   at System.Web.HttpApplication.CallHandlerExecutionStep.System.Web.HttpApplication.IExecutionStep.Execute()
   at System.Web.HttpApplication.ExecuteStep(IExecutionStep step, Boolean& completedSynchronously)

Any ideas will be enormously appreciated.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

轮廓§ 2024-08-11 06:27:46

内存不足

这很可能是内存问题,可能会因其他事情而加剧或触发,但本质上仍然是内存问题。还有另外两种(不太可能)的可能性,您应该首先检查并消除它们(因为这样做很容易):

易于检查可能性:

  1. 您可能启用了“自动关闭”:自动关闭可以正是这样行为,但它很少被打开。要检查这一点,请在 SSMS 中右键单击您的应用程序数据库,选择“属性”,然后选择“选项”窗格。查看“自动关闭”条目并确保其设置为 False。还要检查 tempdb。

  2. SQL 代理作业可能会导致此问题:检查代理的历史记录日志以查看事件期间是否有任何作业持续运行。请记住还要检查维护作业,因为重建索引之类的事情在运行时经常被认为是性能问题。现在这些不太可能是候选者,只是因为它们通常不会受到探查器的影响。

为什么它看起来像内存问题:

如果这些没有显示任何内容,那么您应该检查内存问题。我怀疑内存是您的情况的原因,因为:

  • 您有 1 GB 内存:虽然从技术上讲,这高于 SQL Server 的最小值,但它远低于 SQL Server 的建议值,并且远低于我的经验对于生产环境来说是可以接受的,即使对于负载较轻的服务器也是如此。

  • 您在同一个机器上运行 IIS 和 SQL Server:这本身并不建议这样做,很大程度上是因为会导致内存争用,但由于只有 1 GB 内存,因此会导致 IIS、应用程序、SQL服务器、操作系统和任何其他任务和/或维护都在争夺很少的内存。 Windows 管理此问题的方法是通过积极地从非活动进程中夺走内存来为活动进程提供内存。在这种情况下,像 SQL Server 这样的大型进程可能需要数秒甚至数分钟的时间才能恢复足够的内存,以便能够完全为请求提供服务。

  • Profiler 解决了 90% 的问题:这是一个很大的线索,表明内存可能是问题所在,因为通常情况下,Profiler 之类的东西对这个特定问题确实有这样的效果:Profiler 任务使 SQL Server 只是一个<一直有点活跃。通常,这足以将其排除在操作系统的“清道夫”列表之外,或者至少在一定程度上减少其影响。

如何检查内存是否是罪魁祸首:

  1. 关闭分析器:它对问题有海森堡效应,因此您必须将其关闭,否则您将无法可靠地看到问题。

  2. 从另一个机器运行系统监视器 (perfmon.exe),它远程连接到运行 SQL Server 和 IIS 的机器上的性能收集服务。您可以通过首先删除三个默认统计信息(它们仅是本地的),然后添加所需的统计信息(如下)来最轻松地做到这一点,但请确保更改第一个下拉列表中的计算机名称以连接到您的 SQL

  3. 通过在 perfmon 上创建“计数器日志”将收集到的数据发送到文件。如果您对此不熟悉,那么最简单的方法可能是将数据收集到一个制表符或逗号分隔的文件中,您可以使用 Excel 打开

  4. 设置您的性能监视器以收集到文件并向其中添加以下计数器:

    -- 处理器\%处理器时间[总计]

    -- PhysicalDisk\% 空闲时间[每个磁盘]

    -- 物理磁盘\平均。磁盘队列长度[每个磁盘]

    -- 内存\页/秒

    -- 内存\页面读取/秒

    -- 内存\可用兆字节

    -- 网络接口\字节总数/秒[对于正在使用的每个接口]

    -- 进程\%处理器时间[见下文]

    -- 进程\页面错误/秒[见下文]

    -- Process\Working Set [见下文]

  5. 对于进程计数器(上面),您希望包括 sqlserver.exe 进程、任何 IIS 进程和任何稳定的应用程序进程。请注意,这仅适用于“稳定”进程。根据需要不断重新创建的进程无法以这种方式捕获,因为无法在它们存在之前对其进行指定。

  6. 在问题最常发生的时间内将此集合运行到文件中。将收集间隔设置为接近 10-15 秒。 (这会收集大量数据,但您需要此解决方案来挑选单独的事件)。

  7. 发生一个或多个事件后,停止收集,然后使用 Excel 打开收集的数据文件。您可能需要重新格式化时间戳列,使其可见并显示小时、分钟和秒。使用 IIS 日志查找事件发生的确切时间,然后查看 perfmon 数据以了解事件前后发生的情况。特别是,您想查看其工作集之前是否较小,之后是否较大,其间存在大量页面错误。这是此问题最明显的迹象。

解决方案:

要么将 IIS 和 SQL Server 分离到两个不同的机器上(首选),要么向该机器添加更多内存。我认为至少 3-4 GB 应该是。

那个奇怪的 EF 东西怎么样?

这里的问题是,它很可能是次要问题,或者只是造成主要问题的原因。请记住,Profiler 使 90% 的事件消失,因此剩下的可能是另一个问题,或者可能只是问题最极端的加剧因素。由于它的行为,我猜测它要么正在循环其缓存,要么有一些其他应用程序服务器进程的后台维护。

Not Enough Memory

This is very likely a Memory problem, perhaps aggravated or triggered by other things, but still inherently a memory problem. there are two other (less likely) possibilities, that you should check and eliminate first (because it is easy to do so):

Easy To Check Possibilities:

  1. You may have "Auto Close" enabled: Auto Close can have exactly this behavior, however it is rare for it to be turned on. To check this, in SSMS right-click on your application database, select "Properties", and then select the "Options" pane. Look at the "Auto Close" entry and make sure that it is set to False. Check tempdb also.

  2. SQL Agent Jobs may be causing it: Check the Agent's History Log to see if there were any jobs consistently running during the events. Remember to check maintenance jobs too, as things like Rebuilding Indexes are frequently cited as performance problems while they are running. These are unlikely candidates now, only because they would not normally be affected by the Profiler.

Why It Looks Like a Memory Problem:

If those do not show anything, then you should check for memory problems. I suspect Memory as the cause in your case because:

  • You have 1 GB of Memory: Although this is technically above the Minimum for SQL Server, it is way below the recommended for SQL Server, and way below what in my experience is acceptable for production, even for a lightly loaded server.

  • You are running IIS and SQL Server on the same box: This is not recommended by itself, in large part because of the contention for memory that results, but with only 1 GB of memory it results in IIS, the app, SQL Server, the OS and any other tasks and/or maintenance all fighting for very little memory. The way the Windows manages this is to give memory to the active processes by aggressively taking it away from the non-active processes. It can take many seconds, or even minutes for a large process like SQL Server to get back enough of its memory to be able to completely service a request in this situation.

  • Profiler made 90% of the problem go away: This is a big clue that memory is likely the problem, because typically, things like Profiler have exactly this effect on this particular problem: the Profiler task keeps the SQL Server just a little bit active all of the time. Frequently, this is just enough activity to either keep it off the OS's "scavenger" list, or at least reduces it's impact somewhat.

How to Check For Memory as the Culprit:

  1. Turn Off the Profiler: Its having a Heisenberg effect on the problem, so you have to turn it off or you will not be able to see the problem reliably.

  2. Run a System Monitor (perfmon.exe) from another box, that remotely connects to the perfomrance collection service on the box that your SQL Server and IIS are running on. you can most easily do this by first removing the three default stats (they are local only), and then add in the needed stats (below), but make sure to change the Computer name in the first drop-down to connect to your SQL box.

  3. Send the collected data to a file by creating a "Counter Log" on perfmon. If you are unfamiliar with this, then the easiest thing to do is probably to collect the data to a tab or comma separated file that you can open with Excel to analyze.

  4. Set up your perfmon to collect to a file and add the following counters to it:

    -- Processor\%Processor Time[Total]

    -- PhysicalDisk\% Idle Time[for each disk]

    -- PhysicalDisk\Avg. Disk Queue Length[for each disk]

    -- Memory\Pages/sec

    -- Memory\Page Reads/sec

    -- Memory\Available MBytes

    -- Network Interface\Bytes Total/sec[for each interface in use]

    -- Process\% Processor Time[see below]

    -- Process\Page Faults/sec[see below]

    -- Process\Working Set [see below]

  5. For the Process counters (above) you want to include the sqlserver.exe process, any IIS processes, and any stable application processes. Note that this will ONLY work for "stable" processes. Processes that are continually being re-created as needed, cannot be captured this way because there is no way to specify them before they exist.

  6. Run this collection to a file during the time that the problem most frequently happens. Set the collection interval to something close to 10-15 secs. (this collects a lot of data, but you will need this resolution to pick out the separate events).

  7. After you have one or more incidents, stop the collection and then open your colleced data file with Excel. You will probably have to reformat the timestamp column to be usefully visible and show hours minutes and seconds. Use your IIS log to find the exact time of the incidents, then look at the perfmon data to see what was going on before and after the incident. In particular you want to see if its working set was small before and was large after, with a lot of page faulting in between. That's the clearest sign of this problem.

SOLUTIONS:

Either separate IIS and SQL Server onto two different boxes (preferred) or else add more memory to the box. I would think that 3-4 GB should be a minimum.

What About That Weird EF Stuff?

The problem here is that it is most likely either peripheral or only contributory to your main problem. Remember that Profiler made 90% of your incidents go away, so what remains, may be a different problem, or it may be only the most extreme aggravator of the problem. Because of its behavior I would guess that it is either cycling its cache or there is some other background maintenance of the application server processes.

扬花落满肩 2024-08-11 06:27:46

我会将超时的时间戳与夜间备份的执行时间进行比较。如果它们一致,您可以将 RSS feed 设置为静态。

另一件要尝试的事情(即使它不完全是一个答案)是立即运行 sp_who 当您遇到超时异常时。它不会捕获所有内容(有问题的过程可能会在您运行此程序时完成),但您可能会很幸运。

您还可以在晚上回家时启动 SQL Profiler,并在第二天早上再次看到错误时逐步执行该活动。请确保不要从服务器本身运行它(我很确定它会在启动时提醒您这一点)。

编辑:解决您的更新问题。

EF 是否正在更新/创建其缓存?它可以解释一次查询的丰富性以及为什么后来没有查询命中数据库。

除此之外,看来你有一个 heisenbug。我认为您唯一需要添加的是更多的日志记录(到文件或事件日志)。

I would compare the timestamp of the timeout with the execution time of your nightly backup. If they coincide, you could set your RSS feed to be static for that time.

Another thing to try (even though it isn't exactly an answer) is to immediately run sp_who when you get a timeout exception. It won't catch everything (the offending process could be done by the time you run this) but you may get lucky.

You can also fire up SQL Profiler when you head home for the night and step through the activity the next morning if you see the error again. Just be sure to not run it from the server itself (I'm pretty sure it reminds you of this when it starts).

EDIT: Addressing your update.

Is EF updating/creating its cache? It could explain the abundance of queries at one time and why no queries had database hits later.

Other than that, it appears you have a heisenbug. The only thing I can think for you to add is a lot more logging (to a file or the event log).

意中人 2024-08-11 06:27:46

它闻到了一个同时运行的定时任务的味道。
正如 RBarryYoung 所说......一些夜间备份或者可能是其他东西
您有服务器的 root 访问权限吗?
你能看到 crontab 吗?

是否是 SQL Server 之上的某个全文索引插件在您遇到问题时运行其重新索引过程?

It smells a cronned thing that runs at the same time.
As RBarryYoung says.. some nightly backup or it could be something else
Do you have root access to the server?
Can you see the crontabs?

Could it be some full text indexing plugin on top of the SQL server that runs its reindexing procedures close to the time you are experiencing the issues?

寂寞清仓 2024-08-11 06:27:46

就我而言,当我安装 sqlserver 2008 r2 sp3 时,问题就消失了。

服务器:Windows 7+SqlServer 2008 R2(开发者版)
客户端:Raspberrypi 3B+、Asp.net Core+EF Core

In my case, when I installed sqlserver 2008 r2 sp3,The problem goes away.

Server:Windows 7+SqlServer 2008 R2 (developer edition)
client:Raspberrypi 3B+ ,Asp.net Core+EF Core

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文