优雅地完成 SoftReference 引用对象

发布于 2024-08-09 09:38:01 字数 2239 浏览 16 评论 0原文

我正在使用一个搜索库，它建议保持搜索句柄对象打开，这样可以有利于查询缓存。随着时间的推移，我观察到缓存往往会变得臃肿（几百兆并且不断增长），并且 OOM 开始出现。无法强制执行此缓存的限制，也无法计划它可以使用多少内存。所以我增加了Xmx限制，但这只是问题的临时解决方案。

最终我想使这个对象成为java.lang.ref.SoftReference的引用。因此，如果系统的可用内存不足，它就会释放该对象，并根据需要创建一个新对象。这会在重新启动后降低一些速度，但这是比 OOM 更好的选择。

我看到的关于软引用的唯一问题是没有干净的方法来最终确定它们的引用对象。就我而言，在销毁搜索句柄之前，我需要将其关闭，否则系统可能会耗尽文件描述符。显然，我可以将此句柄包装到另一个对象中，在其上编写终结器（或挂接到 ReferenceQueue/PhantomReference 上），然后放手。但是，嘿，这个星球上的每一篇文章都建议不要使用终结器，特别是反对使用终结器来释放文件句柄（例如Effective Java ed.II，第27页）。

所以我有些疑惑。我应该小心地忽略所有这些建议并继续吗？否则，还有其他可行的替代方案吗？提前致谢。

编辑 #1：按照 Tom Hawtin 的建议测试一些代码后添加了以下文本。对我来说，似乎要么建议不起作用，要么我错过了一些东西。代码如下：

class Bloat {  // just a heap filler really
   private double a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,x,y,z;

   private final int ii;

   public Bloat(final int ii) {
      this.ii = ii;
   }
}

// as recommended by Tom Hawtin
class MyReference<T> extends SoftReference<T> {
   private final T hardRef;

   MyReference(T referent, ReferenceQueue<? super T> q) {
      super(referent, q);
      this.hardRef = referent;
   }
}

//...meanwhile, somewhere in the neighbouring galaxy...
{
   ReferenceQueue<Bloat> rq = new ReferenceQueue<Bloat>();
   Set<SoftReference<Bloat>> set = new HashSet<SoftReference<Bloat>>();
   int i=0;

   while(i<50000) {
//      set.add(new MyReference<Bloat>(new Bloat(i), rq));
      set.add(new SoftReference<Bloat>(new Bloat(i), rq));

//      MyReference<Bloat> polled = (MyReference<Bloat>) rq.poll();
      SoftReference<Bloat> polled = (SoftReference<Bloat>) rq.poll();

      if (polled != null) {
         Bloat polledBloat = polled.get();
         if (polledBloat == null) {
           System.out.println("is null :(");
         } else {
           System.out.println("is not null!");
         }
      }
      i++;
   }
}

如果我使用 -Xmx10m 和 SoftReferences 运行上面的代码片段（如上面的代码所示），我会打印大量的 is null :( 。但是如果我用 MyReference 替换代码（用 MyReference 取消注释两行，并用 SoftReference 注释掉两行），

正如我从建议中了解到的，在 MyReference 内部有硬引用。不应该阻止对象命中 ReferenceQueue，对吗？

原文

I am using a search library which advises keeping search handle object open for this can benefit query cache. Over the time I have observed that the cache tends to get bloated (few hundred megs and keeps growing) and OOMs started to kick in. There is no way to enforce limits of this cache nor plan how much memory it can use. So I have increased the Xmx limit, but that's only a temporary solution to the problem.

Eventually I am thinking to make this object a referent of java.lang.ref.SoftReference. So if the system runs low on free memory, it would let the object go and a new one would be created on demand. This would decrease some speed after fresh start, but this is a much better alternative than hitting OOM.

The only problem I see about SoftReferences is that there is no clean way of getting their referents finalized. In my case, before destroying the search handle I need to close it, otherwise the system might run out of file descriptors. Obviously, I can wrap this handle into another object, write a finalizer on it (or hook onto a ReferenceQueue/PhantomReference) and let go. But hey, every single article in this planet advises against using finalizers, and especially - against finalizers for freeing file handles (e.g. Effective Java ed. II, page 27.).

So I am somewhat puzzled. Should I carefully ignore all these advices and go on. Otherwise, are there any other viable alternatives? Thanks in advance.

EDIT #1: Text below was added after testing some code as suggested by Tom Hawtin. To me, it appears that either suggestion isn't working or I am missing something. Here's the code:

class Bloat {  // just a heap filler really
   private double a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,x,y,z;

   private final int ii;

   public Bloat(final int ii) {
      this.ii = ii;
   }
}

// as recommended by Tom Hawtin
class MyReference<T> extends SoftReference<T> {
   private final T hardRef;

   MyReference(T referent, ReferenceQueue<? super T> q) {
      super(referent, q);
      this.hardRef = referent;
   }
}

//...meanwhile, somewhere in the neighbouring galaxy...
{
   ReferenceQueue<Bloat> rq = new ReferenceQueue<Bloat>();
   Set<SoftReference<Bloat>> set = new HashSet<SoftReference<Bloat>>();
   int i=0;

   while(i<50000) {
//      set.add(new MyReference<Bloat>(new Bloat(i), rq));
      set.add(new SoftReference<Bloat>(new Bloat(i), rq));

//      MyReference<Bloat> polled = (MyReference<Bloat>) rq.poll();
      SoftReference<Bloat> polled = (SoftReference<Bloat>) rq.poll();

      if (polled != null) {
         Bloat polledBloat = polled.get();
         if (polledBloat == null) {
           System.out.println("is null :(");
         } else {
           System.out.println("is not null!");
         }
      }
      i++;
   }
}

If I run the snippet above with -Xmx10m and SoftReferences (as in code above), I'm getting tons of is null :( printed. But if I replace the code with MyReference (uncommenting two lines with MyReference and commenting out ones with SoftReference) I always get OOM.

As I understood from the advice, having hard reference inside MyReference should not prevent object hitting ReferenceQueue, right?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

放飞的风筝 2024-08-16 09:38:01

对于有限数量的资源：子类SoftReference。软引用应指向封闭对象。子类中的强引用应该引用该资源，因此它始终是强可达的。当通过 ReferenceQueue poll 读取资源时，可以关闭资源并将其从缓存中删除。缓存需要正确释放（如果 SoftReference 本身被垃圾回收，则无法将其排队到 ReferenceQueue 中）。

请注意，缓存中只有有限数量的未释放资源 - 逐出旧条目（实际上，如果适合您的情况，您可以使用有限缓存丢弃软引用）。通常情况下，非内存资源更重要，在这种情况下，没有外来引用对象的 LRU 逐出缓存就足够了。

（我的答案#1000。发布于伦敦 DevDay。）

回复收藏 0 原文

蔚蓝源自深海 2024-08-16 09:38:01

汤姆的答案是正确的，但是添加到问题中的代码与汤姆提出的代码不同。 Tom 的提议看起来更像是这样：

class Bloat {  // just a heap filler really
    public Reader res;
    private double a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,x,y,z;

    private final int ii;

    public Bloat(final int ii, Reader res) {
       this.ii = ii;
       this.res = res;
    }
 }

 // as recommended by Tom Hawtin
 class MySoftBloatReference extends SoftReference<Bloat> {
    public final Reader hardRef;

    MySoftBloatReference(Bloat referent, ReferenceQueue<Bloat> q) {
       super(referent, q);
       this.hardRef = referent.res;
    }
 }

 //...meanwhile, somewhere in the neighbouring galaxy...
 {
    ReferenceQueue<Bloat> rq = new ReferenceQueue<Bloat>();
    Set<SoftReference<Bloat>> set = new HashSet<SoftReference<Bloat>>();
    int i=0;

    while(i<50000) {
        set.add(new MySoftBloatReference(new Bloat(i, new StringReader("test")), rq));

        MySoftBloatReference polled = (MySoftBloatReference) rq.poll();

        if (polled != null) {
            // close the reference that we are holding on to
            try {
                polled.hardRef.close();
            } catch (IOException e) {
                e.printStackTrace();
            }
        }
        i++;
    }
}

请注意，最大的区别是硬引用是需要关闭的对象。周围的对象可以并且将会被垃圾收集，因此您不会遇到 OOM，但是您仍然有机会关闭引用。一旦离开循环，它也将被垃圾收集。当然，在现实世界中，您可能不会将 res 设为公共实例成员。

也就是说，如果您持有打开的文件引用，那么您将面临在内存耗尽之前耗尽这些引用的非常现实的风险。您可能还希望有一个 LRU 缓存，以确保您保存的打开文件不超过 500 个。它们也可以是 MyReference 类型，以便在需要时也可以对它们进行垃圾收集。

为了澄清一下 MySoftBloatReference 的工作原理，基类 SoftReference 仍然保留对占用所有内存的对象的引用。这是您需要释放的对象，以防止 OOM 发生。但是，如果对象被释放，您仍然需要释放 Bloat 正在使用的资源，也就是说，Bloat 使用两种类型的资源，内存和文件句柄，这两种资源都需要释放，或者您运行从一种或另一种资源中。 SoftReference 通过释放该对象来处理内存资源的压力，但是您还需要释放其他资源，即文件句柄。由于 Bloat 已经被释放，我们无法使用它来释放相关资源，因此 MySoftBloatReference 保留了对需要关闭的内部资源的硬引用。一旦得知 Bloat 已被释放，即一旦引用出现在 ReferenceQueue 中，那么 MySoftBloatReference 也可以通过其拥有的硬引用关闭相关资源。

编辑：更新了代码，以便将其放入类中时进行编译。它使用 StringReader 来说明如何关闭 Reader 的概念，它用于表示需要释放的外部资源。在这种特殊情况下，关闭该流实际上是无操作，因此不需要，但它显示了在需要时如何执行此操作。

Toms answer is the correct one, however the code that has been added to the question is not the same as what was proposed by Tom. What Tom was proposing looks more like this:

class Bloat {  // just a heap filler really
    public Reader res;
    private double a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,x,y,z;

    private final int ii;

    public Bloat(final int ii, Reader res) {
       this.ii = ii;
       this.res = res;
    }
 }

 // as recommended by Tom Hawtin
 class MySoftBloatReference extends SoftReference<Bloat> {
    public final Reader hardRef;

    MySoftBloatReference(Bloat referent, ReferenceQueue<Bloat> q) {
       super(referent, q);
       this.hardRef = referent.res;
    }
 }

 //...meanwhile, somewhere in the neighbouring galaxy...
 {
    ReferenceQueue<Bloat> rq = new ReferenceQueue<Bloat>();
    Set<SoftReference<Bloat>> set = new HashSet<SoftReference<Bloat>>();
    int i=0;

    while(i<50000) {
        set.add(new MySoftBloatReference(new Bloat(i, new StringReader("test")), rq));

        MySoftBloatReference polled = (MySoftBloatReference) rq.poll();

        if (polled != null) {
            // close the reference that we are holding on to
            try {
                polled.hardRef.close();
            } catch (IOException e) {
                e.printStackTrace();
            }
        }
        i++;
    }
}

Note that the big difference is that the hard reference is to the object that needs to be closed. The surrounding object can, and will, be garbage collected, so you won't hit the OOM, however you still get a chance to close the reference. Once you leave the loop, that will also be garbage collected. Of course, in the real world, you probably wouldn't make res a public instance member.

That said, if you are holding open file references, then you run a very real risk of running out of those before you run out of memory. You probably also want to have an LRU cache to ensure that you keep no more than sticks finger in the air 500 open files. These can also be of type MyReference so that they can also be garbage collected if need be.

To clarify a little on how MySoftBloatReference works, the base class, that is SoftReference, still holds the reference to the object that is hogging all of the memory. This is the object that you need to be freed to prevent the OOM from happening. However, If the object is freed, you still need to free the resources that the Bloat is using, that is, Bloat is using two types of resource, memory and a file handle, both of these resources need to be freed, or you run out of one or the other of the resources. The SoftReference handles the pressure on the memory resource by freeing that object, however you also need to release the other resource, the file handle. Because Bloat has already been freed, we can't use it to free the related resource, so MySoftBloatReference keeps a hard reference to the internal resource that needs to be closed. Once it has been informed that the Bloat has been freed, i.e. once the reference turns up in the ReferenceQueue, then MySoftBloatReference can also close the related resource, through the hard reference that it has.

EDIT: Updated the code so that it compiles when thrown into a class. It uses a StringReader to illustrate the concept of how to close the Reader, which is being used to represent the external resource that needs to be freed. In this particular case closing that stream is effectively a no-op, and so is not needed, but it shows how to do so if it is needed.

回复收藏 0 原文

唱一曲作罢 2024-08-16 09:38:01

嗯。
（据我所知）你不能从两端握住棍子。您要么保留您的信息，要么放弃它。
但是……您可以保留一些关键信息，以便您最终确定。当然，关键信息必须比“真实信息”小得多，并且在其可达对象图中不得包含真实信息（弱引用可能会帮助您）。
基于现有示例（注意关键信息字段）：

public class Test1 {
    static class Bloat {  // just a heap filler really
        private double a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p, q, r, s, t, u, v, w, x, y, z;

        private final int ii;

        public Bloat(final int ii) {
            this.ii = ii;
        }
    }

    // as recommended by Tom Hawtin
    static class MyReference<T, K> extends SoftReference<T> {
        private final K keyInformation;

        MyReference(T referent, K keyInformation, ReferenceQueue<? super T> q) {
            super(referent, q);
            this.keyInformation = keyInformation;
        }

        public K getKeyInformation() {
            return keyInformation;
        }
    }

    //...meanwhile, somewhere in the neighbouring galaxy...
    public static void main(String[] args) throws InterruptedException {
        ReferenceQueue<Bloat> rq = new ReferenceQueue<Bloat>();
        Set<SoftReference<Bloat>> set = new HashSet<SoftReference<Bloat>>();
        int i = 0;

        while (i < 50000) {
            set.add(new MyReference<Bloat, Integer>(new Bloat(i), i, rq));

            final Reference<? extends Bloat> polled = rq.poll();

            if (polled != null) {
                if (polled instanceof MyReference) {
                    final Object keyInfo = ((MyReference) polled).getKeyInformation();
                    System.out.println("not null, got key info: " + keyInfo + ", finalizing...");
                } else {
                    System.out.println("null, can't finalize.");
                }
                rq.remove();
                System.out.println("removed reference");
            }

编辑：
我想详细阐述一下“要么保留你的信息，要么放手”。假设您有某种方式保留您的信息。这将迫使 GC 取消对数据的标记，从而导致只有在第二个 GC 周期中完成数据处理后，数据才会真正被清理。这是可能的 - 这正是 Finalize() 的用途。由于您声明您不希望发生第二个周期，因此您无法保留您的信息（如果a-->b则！b-->！a）。这意味着你必须放手。

编辑2：
实际上，会发生第二个周期 - 但对于您的“关键数据”，而不是您的“主要膨胀数据”。实际数据将在第一个周期被清除。

编辑3：
显然，真正的解决方案将使用单独的线程从引用队列中删除（不要在专用线程上阻塞 poll()、remove()）。

Ahm.
(As far as I know) You can't hold the stick from both ends. Either you hold to your information, or you let it go.
However... you can hold to some key information that would enable you to finalize. Of course, the key information must be significantly smaller then the "real information" and must not have the real information in its reachable object graph (weak references might help you there).
Building on the existing example (pay attention to the key information field):

public class Test1 {
    static class Bloat {  // just a heap filler really
        private double a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p, q, r, s, t, u, v, w, x, y, z;

        private final int ii;

        public Bloat(final int ii) {
            this.ii = ii;
        }
    }

    // as recommended by Tom Hawtin
    static class MyReference<T, K> extends SoftReference<T> {
        private final K keyInformation;

        MyReference(T referent, K keyInformation, ReferenceQueue<? super T> q) {
            super(referent, q);
            this.keyInformation = keyInformation;
        }

        public K getKeyInformation() {
            return keyInformation;
        }
    }

    //...meanwhile, somewhere in the neighbouring galaxy...
    public static void main(String[] args) throws InterruptedException {
        ReferenceQueue<Bloat> rq = new ReferenceQueue<Bloat>();
        Set<SoftReference<Bloat>> set = new HashSet<SoftReference<Bloat>>();
        int i = 0;

        while (i < 50000) {
            set.add(new MyReference<Bloat, Integer>(new Bloat(i), i, rq));

            final Reference<? extends Bloat> polled = rq.poll();

            if (polled != null) {
                if (polled instanceof MyReference) {
                    final Object keyInfo = ((MyReference) polled).getKeyInformation();
                    System.out.println("not null, got key info: " + keyInfo + ", finalizing...");
                } else {
                    System.out.println("null, can't finalize.");
                }
                rq.remove();
                System.out.println("removed reference");
            }

Edit:
I want to elaborate on the "either hold your information or let it go". Assuming you had some way of holding to your information. That would have forced the GC to unmark your data, causing the data to actually be cleaned only after you're done with it, in a second GC cycle. This is possible - and its exactly what finalize() is for. Since you stated that you don't want the second cycle to occur, you can't hold your information (if a-->b then !b-->!a). which means you must let it go.

Edit2:
Actually, a second cycle would occur - but for your "key data", not your "major bloat data". The actual data would be cleared on the first cycle.

Edit3:
Obviously, the real solution would use a separate thread for removing from the reference queue (don't poll(), remove(), blocking on the dedicated thread).

回复收藏 0 原文

红墙和绿瓦 2024-08-16 09:38:01

@Paul - 非常感谢您的回答和澄清。

@Ran - 我认为在您当前的代码中，循环末尾缺少 i++ 。另外，您不需要在循环中执行 rq.remove() ，因为 rq.poll() 已经删除了顶部引用，不是吗？

几点：

1）我必须在循环中的 i++ 之后添加 Thread.sleep(1) 语句（对于 Paul 和 Ran 的解决方案）以避免 OOM，但这与大局无关，而且也依赖于平台。我的机器有一个四核 CPU 并且运行 Sun Linux 1.6.0_16 JDK。

2）在查看这些解决方案之后，我想我会坚持使用终结器。 Bloch 的书提供了以下原因：

不能保证终结器会立即执行，因此永远不要在终结器中做任何时间关键的事情——对 SoftRererence 也没有任何保证！
永远不要依赖终结器来更新关键的持久状态——我并不认为
使用终结器会带来严重的性能损失——在最坏的情况下，我会每分钟左右终结一个对象。我想我可以忍受这一点。
使用 try/finally ——哦，是的，我一定会的！

仅仅为了看似简单的任务就需要创建大量的脚手架，这对我来说似乎不合理。
我的意思是，从字面上看，对于其他查看此类代码的人来说，每分钟的 WTF 率会相当高。

3) 遗憾的是，没有办法在 Paul、Tom 和 Ran 之间分配分数:(
我希望汤姆不会介意，因为他已经得到了很多:)在保罗和兰之间做出判断要困难得多 - 我认为这两个答案都有效并且是正确的。我只是对 Paul 的答案设置接受标志，因为它的评级更高（并且有更详细的解释），但 Ran 的解决方案一点也不差，如果我选择使用 SoftReferences 实现它，可能会是我的选择。谢谢你们！