java中缓存系统的压缩特性
我正在构建一个必须存储尽可能多数据的缓存。 CPU 不是主要问题,因为要达到下一级数据比运行 CPU 进行解压缩要昂贵得多。
我正在寻找一个好的策略,而不是一个完整的实施。应该缓存的典型对象实例可以概括为哈希图列表。这些映射中的键与该列表中另一个映射中的键非常相似。键和值是字符串。
不同缓存对象中的映射(这也意味着不同的列表)可能并不总是具有相似的键。也许只有一部分 (50%) 的键是相同的。
我正在考虑将键提取到 ONE 标头数组中,并将哈希图的每个值集合提取到具有相同长度的另一个数组中。这意味着数据数组可能是稀疏的(空指针)。但我不必随身携带元数据。数据数组中的位置是查找正确密钥的唯一方法。
现在我想压缩数据数组。压缩对于单个数据数组并不能很好地工作,因为信息很少。它将需要一些数据数组粘在一起才能获得良好的压缩率。
java中有没有压缩字符串数组的好方法?为了获得良好的结果,我应该对多少个数据数组进行聚类?
也许有更好的 aporoach 吗?这是一个收集想法的开放式问题,所以请随意详细说明:-)
I am building a cache that has to store as much data as possible. CPU is not a mayor issue, because the next level of data is a lot more expessive to reach than running the CPUs a little bit for decompression.
I'm looking for a good strategy and not a full implemenation. A typical object instance that should be cached can be gernalized as a list of hashmaps. The keys in these map are very similiar to keys in another map in that list. Keys and values are strings.
Maps in different caching objects (this means also different lists) may not always have similar keys. Maybe only a subset (50%) of the keys is the same.
I was thinking of extracting the keys into ONE header array and each collection of values of the hashmap into another array with the same length. This means the data array might be sparse (null-pointers). But I don't have to carry the meta data around. The possition in the data array is the only way of looking up the correct key.
Now I want to compress the data array. Compression won't really work well on a single data array because there is little information. It will need a few data arrays stuck together to get a good compression rate.
Is there any good way of compressing String-Arrays in java? How many of these data arrays should I cluster for good results?
Is there maybe some better aporoach? This is a open questions for collecting ideas, so please feel free to elaborate :-)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
Flyweight 可以提供帮助
如果不压缩,您可以使用 Flyweight 模式来避免每个对象中重复字符串键的成本。
请记住,字符串是一个对象,因此哈希图中的键是对其的引用。如果许多具有相同属性的对象使用对同一字符串对象的引用,则每个引用只有 4 个字节,并且内存中只有一个字符串。
如何确保在对象之间共享字符串对象?您可以使用类似于 String.intern() 的方法。 但请不要使用 String.intern() 本身。
实习字符串是为相同的字符串值返回相同的字符串对象。您必须为这些字符串保留一个缓存。我不推荐 String.intern() 的原因是缓存是 String 类本身,因此它永远不会被释放。但你可以实现类似的东西。
如果它是新的,此代码将返回您自己的字符串。如果不是,则返回第一个。
但是如果您正在压缩,则不会
因为压缩意味着您正在序列化对象图,并且每个属性名称将被序列化为不同的字符串,因此会重复本身。也许压缩后的大小不会增长太多,因为它是一个重复的字符串,但是当您重新隐藏对象时,它们将被单独创建。
也许您可以在重新隐藏时使用
returnUniqueString
:)Flyweight can help
If are not compressing you can use Flyweight pattern to avoid the cost of the string-key repeated in each object.
Remember a string is an object so a key in your hashmap is a reference to it. If a lot of objects with the same property use references to the same string object you only have 4-bytes for each reference and only one string in memory.
How to ensure you are sharing the string objects between objects? You can use something similar to
String.intern()
. But please don't use String.intern() itself.Interning a string is returning the same string-object for the same string value. You must hold a cache for those strings. The reason I don't recommend String.intern() is that the cache is the String class itself so it nevers get freed. But you can implement something analogous.
This code returns your own string if it's new. And returns the first one if it's not.
But if you are compressing, not
Because compressing means you are serializing your object graphs and each property name will be serialized as a different string, so repeating itself. Maybe the compressed size doesn't grow too much because it's a repeated string but when you rehidrate the objects they will be created separately.
Maybe you can use the
returnUniqueString
at the time of rehidrating :)这听起来是个好方法。
但是,我建议您考虑采用不同的方法将映射值分解为列表:不要为每个映射创建一个列表,而是为每个不同的键创建一个列表,其中包含每个项目的该键的值。
例如,如果您的映射是:
那么您分解为:
这可能看起来有点奇怪,但重点是您将相同类型的值聚集在一起,这将有助于提高压缩效率。
This sounds like a good approach.
However, i suggest you consider a different way of breaking the map values into lists: rather than making a list for each map, make a list for each different key, containing the values for that key for each item.
For example, if your maps are:
Then you decompose into:
This may seem a bit weird, but the point is that you cluster values of the same type together, which will help the compression be more efficient.