降低数据集的粒度

发布于 2024-11-14 20:15:22 字数 1039 浏览 4 评论 0原文

我有一个内存缓存，它通过一定程度的聚合来存储一组信息 - 在下面的学生示例中，假设我按年份、主题、教师存储它：

#    Students    Year    Subject    Teacher
1    30          7       Math       Mrs Smith
2    28          7       Math       Mr Cork
3    20          8       Math       Mrs Smith
4    20          8       English    Mr White
5    18          8       English    Mr Book
6    10          12      Math       Mrs Jones

现在不幸的是，我的缓存没有 GROUP BY 或类似的函数- 因此，当我想以更高的聚合级别查看事物时，我必须自己“汇总”数据。例如，如果我按年份、主题汇总学生，上述数据将如下所示：

#    Students    Year    Subject
1    58          7       Math
2    20          8       Math 
3    38          8       English
4    10          12      Math

我的问题是 - 我如何在 Java 中最好地做到这一点？理论上，我可以从该缓存中提取数以万计的对象，因此能够快速“汇总”这些集合可能变得非常重要。

我最初的（也许是天真的）想法是按照以下方式做一些事情；

直到我用尽记录列表：

我来的每一个“独特”记录 across 作为键添加到哈希图。
如果我遇到这样的记录对于这个新级别具有相同的数据的聚合，将其数量添加到现有的。

现在据我所知，这是一个相当普遍的问题，并且有更好的方法可以做到这一点。因此，我欢迎任何关于我是否为自己指明正确方向的反馈。

恐怕“获取新缓存”不是一个选项:)

-Dave。

原文

I have an in-memory cache which stores a set of information by a certain level of aggregation - in the Students example below let's say I store it by Year, Subject, Teacher:

#    Students    Year    Subject    Teacher
1    30          7       Math       Mrs Smith
2    28          7       Math       Mr Cork
3    20          8       Math       Mrs Smith
4    20          8       English    Mr White
5    18          8       English    Mr Book
6    10          12      Math       Mrs Jones

Now unfortunately my cache doesn't have GROUP BY or similar functions - so when I want to look at things at a higher level of aggregation, I will have to 'roll up' the data myself. For example, if I aggregate Students by Year, Subject the aforementioned data would look like so:

#    Students    Year    Subject
1    58          7       Math
2    20          8       Math 
3    38          8       English
4    10          12      Math

My question is thus - how would I best do this in Java? Theoretically I could be pulling back tens of thousands of objects from this cache, so being able to 'roll up' these collections quickly may become very important.

My initial (perhaps naive) thought would be to do something along the following lines;

Until I exhaust the list of records:

Each 'unique' record that I come
across is added as a key to a
hashmap.
If I encounter a record that
has the same data for this new level
of aggregation, add its quantity to
the existing one.

Now for all I know this is a fairly common problem and there's much better ways of doing this. So I'd welcome any feedback as to whether I'm pointing myself in the right direction.

"Get a new cache" not an option I'm afraid :)

-Dave.

分享到QQ

分享到微博