使用 > 90% RAM 时 C# Windows 服务的奇怪行为

发布于 2024-12-08 06:16:56 字数 708 浏览 0 评论 0原文

一段时间以来,我一直在尝试调试在两个数据库之间同步数据的算法。在几个月的日常使用中,一切正常,最近奇怪的事情开始发生,例如:

  • “DisableBuyButton”属性有时设置为 true,而在配置文件中明确设置为 false(我已经单步执行了太多算法)有时会发现一些奇怪的东西 - 什么也没发现,而在我的机器上执行此操作时一切正常)
  • 产品被分配到与预期不同的类别

,并且有许多类似的错误。

然后我记得在一次 CodeCamp 会议上有人说,在调试 ASP.NET 应用程序时,他们遇到了处于某种“紧急模式”的垃圾收集器问题 - 这导致发生许多意外且奇怪的错误。

我检查了可用内存量的情况 - > 90% 已使用。我简单地通过向虚拟机添加 1GB 或更多 RAM 来解决这个问题 - 所有这些奇怪的事情都消失得无影无踪。

现在的问题是:这怎么可能呢?

//编辑:确保只有一个实例正在运行的关键部分:

        lock (this)
        {
            if (WorkStarted)
            {
                return;
            }
            else
            {
                WorkStarted = true;
            }
        }

for some time I was trying to debug my algorithm that synchronizes data between two databases. Everything worked ok for a few months of everyday use and recently strange things started to happen eg:

  • "DisableBuyButton" property was sometimes set to true, while this was explicitly set to false in the configuration file (I have stepped through the algorithm way too many times to find something odd - found nothing and while doing this on my machine everything was ok)
  • products were assigned to different categories than expected

and many similar mistakes.

Then I remembered that on one CodeCamp meeting someone said that while debugging ASP.NET applications they had an issue with garbage collector that was in some kind of "panic mode" - this caused many unexpected and weird errors to occur.

I've checked what was going on with the amount of free memory - >90% was used. I resolved the issue simply by adding 1gb or RAM more to the virtual machine - all those weird things just vanished into a thin air.

Now the question: HOW IS THIS EVEN REMOTELY POSSIBLE?

//edit: Critical section to ensure only one instance is running:

        lock (this)
        {
            if (WorkStarted)
            {
                return;
            }
            else
            {
                WorkStarted = true;
            }
        }

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

甩你一脸翔 2024-12-15 06:16:56

涉及的因素太多,无法确定,但一个潜在的问题是默认值的静默回退(假设您每次访问属性时都尝试从配置文件加载,但如果不能,则返回默认值)。

另一个问题(也可能是您遇到的问题)是缓存使用。您可能会遇到竞争条件或超时,将某些值重置为 null,这与上述情况相结合会产生有趣的情况。这可能只发生在相当长一段时间没有访问的情况下(通常情况下进程会在下次访问时回收并重新开始)或处理高内存情况时。

基本上,垃圾收集器不会随机翻转您的位(或删除可访问的对象),但它可能会做一些在我们添加的抽象层之后出现的事情。

编辑:

很抱歉冗长的回复,这个主题非常复杂,我试图给出普遍适用的建议。

让我们举一个简单的缓存示例,该缓存根据上次访问时间对所有内容进行排序,并将最旧的 20% 按计数放入 WeakReference 对象中,以防 GC 运行。然后,您从缓存中获取信息,但由于它可能已过期,因此您检查它是否存在。这将调用 WeakReferenceHasValue 属性,该属性返回 true。最后你去读取对象,但在你可以之前,GC 运行并杀死该对象。缓存优雅地返回 null,但现在您的项目在其期望值的位置出现了 null。不要担心异常处理例程检测到错误并决定返回默认值(对于 bool 来说是 false)。

所有这些结合起来会造成一种情况,即您的代码没有意识到但它具有不正确的值,如果您一段时间不回调缓存,这个问题可能会持续一段时间。如果您的缓存通过删除数据(可能位于代码的“安全”部分中间)来响应 GC 调用,则可能会发生类似的情况。

解决此类问题的最佳解决方案是严格遵循缓存使用指南,或者如果您自己制定缓存使用指南,请确保您了解正在使用的任何交互,例如 WeakReference 和GC 工作,或者挂接到 GC 进程中的奇怪计时问题。

例如,如果您确实想要该值,则使用 WeakReferenceHasValue 并不是一个好主意。相反,您应该只获取 Value,在最坏的情况下将返回 null,并测试 null (或者此时您可以调用HasValue 因为如果您有指向对象的引用,GC 不会干扰,但当 null 检查时则毫无意义)

There are too many factors involved to be certain, but one potential issue is silent fallbacks to default values (Say you attempt to load from config file every time the property is accessed, but return the default value if you can't).

Another issue and potentially the one you encountered is cache usage. You could have race conditions or time-outs that reset certain values to null which combined with the above can create interesting situations. This probably only happens in situations where there has been no access in quite some time (which usually cases the process to recycle and start fresh upon the next access) or when dealing with high memory situations.

Basically the garbage collector isn't going to flip bits at random on you (or delete accessible object) but it may do things that can appear as such after the layers of abstraction that we add.

EDIT:

Sorry for the long winded response, the topic is very complex, and I am trying to give generally applicable advice.

Lets take a simple example of a cache that orders everything with a last accessed time, and puts the oldest 20% by count into WeakReference objects in case the GC runs. You then go to get information from the cache, but since it might have expired, you check to see if it exists. This calls the HasValue property of WeakReference which returns true. Finally you go to read the object, but before you can the GC runs and kills the object. The cache gracefully returns null, but now your project has a null where it expected a value. Not to fear the exception handling routine detects an error and decides to return the default value instead (false in the case of a bool).

This all combines to create a situation where your code doesn't realize but it has the incorrect value, this problem could last a while if you don't call back to the cache for a while. Similar situations can occur if your cache responds to a GC call by removing data, potentially in the middle of a "safe" section of code.

The best solution to these kinds of issues is to follow your cache usage guidelines to the letter, or if you roll your own then make sure you understand any interacts you are using, such as the way WeakReference and the GC work, or the odd timing issues in hooking into the GC process.

For example using HasValue of WeakReference is not a good idea if you actually want the value. Instead you should just get Value, which will in the worst case return null, and test for null instead (or at that point you can call HasValue since the GC won't interfere if you have a reference to the pointed object, but kind of pointless when a null check will do)

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文