Lucene 作为大容量缓存?
我使用的系统需要进行大量的批处理。我们有预加载的缓存来帮助提高数据查找性能。然而,在某些情况下,我无法将整个数据集缓存到内存中。到目前为止,我一直在恢复运行查询来查找会降低性能的数据。为了解决这个问题,我添加了一个混合缓存,其中使用 HashMap 达到阈值,然后溢出到 Lucene 索引(本地文件系统)。这比运行查询有明显的改进(比数据库查询快 6 到 10 倍)。然而,我希望能好一点,并且想知道是否有更好的替代方案。我使用单个字符串作为密钥并缓存 Java 对象。我想坚持使用 Java 库,以免使我的部署变得复杂。 (我想避免单独的服务器进程。)我想知道是否有其他人正在使用 Lucene 来实现此目的。有更好的选择吗?
I work with a system where we do a lot of batch processing. We have caches that are pre-loaded to help with data lookup performance. There are scenarios however where I cannot cache the entire dataset into memory. Up to this point, I've been reverting to running a query to lookup the data which kills performance. In an attempt to resolve, I've added a hybrid cache where I use a HashMap up to a threshold and then I spill over to a Lucene index (local file system). This is a definite improvement over running the query (anywhere from 6 to 10 times faster than the database query). However, I was hoping for a little better and was wondering if there are better alternatives for this kind of thing. I'm using a single String as my key and am caching Java objects. I'd like to stick with a Java library to not complicate my deployment. (I'd like to avoid a separate server process.) I was wondering if anyone else was using Lucene for this purpose. Are there better alternatives for this?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
我认为如果您使用单个字符串作为键并且不需要在存储的数据中运行查询,您可以使用 Google 的 LevelDB。它具有良好的性能并且内存使用率低。检查一下:http://code.google.com/p/leveldb/
I think that if you are using a single string as key and don't need to run queries in the data stored, you could use Google's LevelDB. It has good performance and does low memory usage. Check this: http://code.google.com/p/leveldb/
有几个可用的缓存库应该能够通过在达到特定阈值时将缓存条目写入磁盘来处理您的情况。一个好的策略是将租用访问的条目写入磁盘。还有一些缓存将缓存条目分布在集群上,将所有内容保留在内存中。
我经常使用的缓存解决方案是 Infinispan: http://www.jboss.org/infinispan
它是快速、易于使用、可扩展,并且一定能解决您的问题。
There are several cache libraries available that should be able to handle your situation by writing cache entries to disk if a certain threshold is reached. A good strategy is to write the entries to disk that are leased accessed. There are also caches out there that distribute the cache entries over a cluster, keeping everything in memory.
A cache solution that I use often is Infinispan: http://www.jboss.org/infinispan
It is fast, easy to use, scalable and can certainly handle your problem.
EhCache!似乎很适合我的需要。对于这个单键查找,它的性能比 Lucene 好得多。支持磁盘溢出,使用简单。
EhCache! seems to be a good fit for what I need. It performs much better than Lucene for this single key lookup. It supports disk overflow and is simple to use.