需要帮助追踪偶发的“无法序列化会话状态”问题服务器错误
我们定期收到用户关于以下服务器错误的报告。
[OutOfMemoryException: Exception of type System.OutOfMemoryException was thrown.]
[HttpException (0x80004005): Unable to serialize the session state. Please note that non-serializable objects or MarshalByRef objects are not permitted when session state mode is ‘StateServer’ or ‘SQLServer’
一旦处于出现此错误的状态,该错误是否可以在本地重现就显得很不确定。如果是的话,那么我们通常可以在几分钟内重现它们,但不能在每个点击的页面上重现它们。这种情况通常会自行消失,并在我们重新与用户联系时自行解决。
Web 服务在工作时间内大约有 90-100 个活动连接。该服务器上唯一的其他站点是该站点的临时版本,该站点很少受到访问。会话状态与应用程序数据库存储在同一 SQLServer 实例上,该实例位于相当大的虚拟机集群上。在此过程中,Web 服务器或 SQLServer 似乎都没有受到负担(无论是处理器还是内存方面)。
出现错误的页面的分布似乎与每个页面的正态分布相当。就发生时间而言似乎没有任何模式。平均而言,周末的错误确实较少(这与正常的网站负载相关),但即使如此,似乎也不一致。
记录的错误与记录的任何类型的性能监视器事件之间似乎也没有关联。这包括一系列性能监视器计数器,其中包括:
.NET CLR Jit(w3wp)\notal # of IL Bytes Jitted
.NET CLR Jit(w3wp)\IL Bytes Jitted / sec
.NET CLR Jit(w3wp)\% Time in Jit
.NET CLR Jit(w3wp)\# of Methods Jitted
.NET CLR Jit(w3wp)\# of IL Bytes Jitted
ASP.NET Apps v1.1.4322(__Total__)\Requests Failed
ASP.NET Apps v1.1.4322(__Total__)\Errors Unhandled During Execution/Sec
ASP.NET Apps v1.1.4322(__Total__)\Errors Unhandled During Execution
ASP.NET Apps v1.1.4322(__Total__)\Cache Total Turnover Rate
ASP.NET Apps v1.1.4322(__Total__)\Errors During Preprocessing
ASP.NET Apps v1.1.4322(__Total__)\Errors During Execution
ASP.NET Apps v1.1.4322(__Total__)\Requests Executing
ASP.NET Apps v1.1.4322(__Total__)\Requests Total
ASP.NET Apps v1.1.4322(__Total__)\Errors Total
ASP.NET Apps v1.1.4322(__Total__)\Sessions Abandoned
ASP.NET Apps v1.1.4322(__Total__)\Errors Total/Sec
ASP.NET Apps v1.1.4322(__Total__)\Anonymous Requests/Sec
ASP.NET Apps v1.1.4322(__Total__)\Requests/Sec
ASP.NET Apps v1.1.4322(__Total__)\Session SQL Server connections total
ASP.NET Apps v1.1.4322(__Total__)\Cache Total Hit Ratio
ASP.NET v1.1.4322\Requests Current
ASP.NET v1.1.4322\Request Execution Time
Memory\Pages/sec
Bytes Total/sec
PhysicalDisk(_Total)\Avg. Disk Queue Length
Processor(_Total)\% Processor Time
Web Service Cache\File Cache Hits %
Web Service Cache\File Cache Misses
Web Service Cache\File Cache Hits
Web Service(_Total)\Current Connections
Web Service(_Total)\Post Requests/sec)
我在日志中看到的唯一模式与这些错误的发生无关,但却是我能看到的唯一模式。查看 perfmon 日志,我们看到一种模式,其中“Jitted 的 IL 字节总数”、“Jitted IL 字节数/秒”、“Jit 时间百分比”、“Jitted 方法数”和“Jitted IL 字节数” ” 暂存站点的计数器(不应获得任何流量)在 20-50 分钟内不会提取数据,此后“IL Bytes Jitted / sec”立即出现峰值,并且“% Time”出现跳跃in Jit” 2-20 分钟,主站点的利用率高达 99%。
如果有人对可能导致此问题的原因有任何想法,或者有类似问题的经验,我将不胜感激。
谢谢!
We have been receiving reports of the following server error periodically from users.
[OutOfMemoryException: Exception of type System.OutOfMemoryException was thrown.]
[HttpException (0x80004005): Unable to serialize the session state. Please note that non-serializable objects or MarshalByRef objects are not permitted when session state mode is ‘StateServer’ or ‘SQLServer’
Once in a state where this error appears, it appears to be hit or miss whether the errors are reproducible locally. If they are, then we can usually reproduce them for a couple minutes, but not on every page hit. This usually tapers off on its own and usually has resolved itself by the time we get back in contact with the users.
The Web Service has around 90-100 active connections during business hours. The only other site on this server is the staging version of this site, which gets hit very infrequently. The Session State is stored on the same SQLServer instance as the application database which is housed on a fairly large cluster of virtual machines. Neither the Web Server or the SQLServer seemed to be taxed (either processor or memory-wise) while this is going on.
The distribution of which pages are erroring seems to be comparable to the normal distribution for each page. There doesn't appear to be any pattern in terms of times of occurrence. We do have less errors on average on weekends (which correlates to normal site load), but even this appears to not be consistent.
There also doesn't appear to be a correlation between the errors logged and any kind of logged performance monitor events. This includes an array of perfmon counters including:
.NET CLR Jit(w3wp)\notal # of IL Bytes Jitted
.NET CLR Jit(w3wp)\IL Bytes Jitted / sec
.NET CLR Jit(w3wp)\% Time in Jit
.NET CLR Jit(w3wp)\# of Methods Jitted
.NET CLR Jit(w3wp)\# of IL Bytes Jitted
ASP.NET Apps v1.1.4322(__Total__)\Requests Failed
ASP.NET Apps v1.1.4322(__Total__)\Errors Unhandled During Execution/Sec
ASP.NET Apps v1.1.4322(__Total__)\Errors Unhandled During Execution
ASP.NET Apps v1.1.4322(__Total__)\Cache Total Turnover Rate
ASP.NET Apps v1.1.4322(__Total__)\Errors During Preprocessing
ASP.NET Apps v1.1.4322(__Total__)\Errors During Execution
ASP.NET Apps v1.1.4322(__Total__)\Requests Executing
ASP.NET Apps v1.1.4322(__Total__)\Requests Total
ASP.NET Apps v1.1.4322(__Total__)\Errors Total
ASP.NET Apps v1.1.4322(__Total__)\Sessions Abandoned
ASP.NET Apps v1.1.4322(__Total__)\Errors Total/Sec
ASP.NET Apps v1.1.4322(__Total__)\Anonymous Requests/Sec
ASP.NET Apps v1.1.4322(__Total__)\Requests/Sec
ASP.NET Apps v1.1.4322(__Total__)\Session SQL Server connections total
ASP.NET Apps v1.1.4322(__Total__)\Cache Total Hit Ratio
ASP.NET v1.1.4322\Requests Current
ASP.NET v1.1.4322\Request Execution Time
Memory\Pages/sec
Bytes Total/sec
PhysicalDisk(_Total)\Avg. Disk Queue Length
Processor(_Total)\% Processor Time
Web Service Cache\File Cache Hits %
Web Service Cache\File Cache Misses
Web Service Cache\File Cache Hits
Web Service(_Total)\Current Connections
Web Service(_Total)\Post Requests/sec)
The only pattern I can see in the logs doesn't correlate to the occurrence of these errors, but is the only pattern I can see. Looking at the perfmon logs we are seeing a pattern where the "Total # of IL Bytes Jitted", "IL Bytes Jitted / sec", "% Time in Jit", "# of Methods Jitted", and "# of IL Bytes Jitted" counters for the staging site (which shouldn't be getting any traffic) doesn't pull data for a 20-50 minute period after which there is an immediate spike in "IL Bytes Jitted / sec" and a jump in "% Time in Jit" for 2-20 minute of up to 99% for the main site.
If anyone has any ideas on what could be causing this, or has had experience with a similar issue I would be grateful for any input.
Thanks!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
这是一个疯狂的尝试,因为我最近遇到了类似的问题(不完全相同)。
您是否在启动时使用 /3GB 标志对于你的服务器?
即使您不知道,您也可以通过 perfmon(在内存下)查看免费系统页表条目。您应该可以访问 15K。任何低于 5-10K 的内容都是“不好的”,并且在保存到会话时可能会导致 OOM 异常。
http:// /blogs.technet.com/b/clint_huffman/archive/2008/04/07/free-system-page-table-entries-ptes.aspx
This is a wild stab, since I had a similar problem recently (not exactly the same).
Are you using the /3GB flag at startup for your server?
Even if you're not, you may take a look at the Free System Page Table Entries via perfmon (under Memory). You should have in access of 15K. Anything under 5-10K is "bad", and can result in OOM exceptions when saving to session.
http://blogs.technet.com/b/clint_huffman/archive/2008/04/07/free-system-page-table-entries-ptes.aspx