当前位置：文江博客话题详情

java中缓存系统的压缩特性

发布于 2024-12-08 14:33:29 字数 530 浏览 0 评论 0原文

我正在构建一个必须存储尽可能多数据的缓存。 CPU 不是主要问题，因为要达到下一级数据比运行 CPU 进行解压缩要昂贵得多。

我正在寻找一个好的策略，而不是一个完整的实施。应该缓存的典型对象实例可以概括为哈希图列表。这些映射中的键与该列表中另一个映射中的键非常相似。键和值是字符串。

不同缓存对象中的映射（这也意味着不同的列表）可能并不总是具有相似的键。也许只有一部分 (50%) 的键是相同的。

我正在考虑将键提取到 ONE 标头数组中，并将哈希图的每个值集合提取到具有相同长度的另一个数组中。这意味着数据数组可能是稀疏的（空指针）。但我不必随身携带元数据。数据数组中的位置是查找正确密钥的唯一方法。

现在我想压缩数据数组。压缩对于单个数据数组并不能很好地工作，因为信息很少。它将需要一些数据数组粘在一起才能获得良好的压缩率。

java中有没有压缩字符串数组的好方法？为了获得良好的结果，我应该对多少个数据数组进行聚类？

也许有更好的 aporoach 吗？这是一个收集想法的开放式问题，所以请随意详细说明:-)

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

或十年 2024-12-15 14:33:30

Flyweight 可以提供帮助

如果不压缩，您可以使用 Flyweight 模式来避免每个对象中重复字符串键的成本。

请记住，字符串是一个对象，因此哈希图中的键是对其的引用。如果许多具有相同属性的对象使用对同一字符串对象的引用，则每个引用只有 4 个字节，并且内存中只有一个字符串。

如何确保在对象之间共享字符串对象？您可以使用类似于 String.intern() 的方法。 但请不要使用 String.intern() 本身。

实习字符串是为相同的字符串值返回相同的字符串对象。您必须为这些字符串保留一个缓存。我不推荐 String.intern() 的原因是缓存是 String 类本身，因此它永远不会被释放。但你可以实现类似的东西。

如果它是新的，此代码将返回您自己的字符串。如果不是，则返回第一个。

HashMap<String,String> internedStrings = new HashMap<String,String>();

syncrhonized String returnUniqueString(String str) {
   String alreadyCached = internedStrings.get(str);
   if (alreadyCached == null) {
      internedStrings.put(str, str);
      alreadyCached = str;
   }
   return alreadyCached;
}

但是如果您正在压缩，则不会

因为压缩意味着您正在序列化对象图，并且每个属性名称将被序列化为不同的字符串，因此会重复本身。也许压缩后的大小不会增长太多，因为它是一个重复的字符串，但是当您重新隐藏对象时，它们将被单独创建。

也许您可以在重新隐藏时使用 returnUniqueString :)

Flyweight can help

If are not compressing you can use Flyweight pattern to avoid the cost of the string-key repeated in each object.

Remember a string is an object so a key in your hashmap is a reference to it. If a lot of objects with the same property use references to the same string object you only have 4-bytes for each reference and only one string in memory.

How to ensure you are sharing the string objects between objects? You can use something similar to String.intern(). But please don't use String.intern() itself.

Interning a string is returning the same string-object for the same string value. You must hold a cache for those strings. The reason I don't recommend String.intern() is that the cache is the String class itself so it nevers get freed. But you can implement something analogous.

This code returns your own string if it's new. And returns the first one if it's not.

HashMap<String,String> internedStrings = new HashMap<String,String>();

syncrhonized String returnUniqueString(String str) {
   String alreadyCached = internedStrings.get(str);
   if (alreadyCached == null) {
      internedStrings.put(str, str);
      alreadyCached = str;
   }
   return alreadyCached;
}

But if you are compressing, not

Because compressing means you are serializing your object graphs and each property name will be serialized as a different string, so repeating itself. Maybe the compressed size doesn't grow too much because it's a repeated string but when you rehidrate the objects they will be created separately.

Maybe you can use the returnUniqueString at the time of rehidrating :)

回复收藏 0 原文

不必了 2024-12-15 14:33:30

这听起来是个好方法。

但是，我建议您考虑采用不同的方法将映射值分解为列表：不要为每个映射创建一个列表，而是为每个不同的键创建一个列表，其中包含每个项目的该键的值。

例如，如果您的映射是：

1: {
    colour: red,
    size: small,
},
2: {
    colour: blue,
    flavour: strawberry
},
3: {
    colour: red,
    size: large,
    flavour: strawberry
}

那么您分解为：

colour: [red, blue, red]
size: [small, null, large]
flavour: [null, strawberry, strawberry]

这可能看起来有点奇怪，但重点是您将相同类型的值聚集在一起，这将有助于提高压缩效率。

This sounds like a good approach.

However, i suggest you consider a different way of breaking the map values into lists: rather than making a list for each map, make a list for each different key, containing the values for that key for each item.

For example, if your maps are:

1: {
    colour: red,
    size: small,
},
2: {
    colour: blue,
    flavour: strawberry
},
3: {
    colour: red,
    size: large,
    flavour: strawberry
}

Then you decompose into:

colour: [red, blue, red]
size: [small, null, large]
flavour: [null, strawberry, strawberry]

This may seem a bit weird, but the point is that you cluster values of the same type together, which will help the compression be more efficient.

回复收藏 0 原文

~没有更多了~

关于作者

孤独难免

暂无简介

0 文章

0 评论

22 人气

关注发私信

友情链接

文江博客

java中缓存系统的压缩特性

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

已经忘了多久

15867725375

LonelySnow

走过海棠暮

轻许诺言

信馬由缰

友情链接

java中缓存系统的压缩特性

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

已经忘了多久

15867725375

LonelySnow

走过海棠暮

轻许诺言

信馬由缰

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。