用于以数据库表格格式计算频率的数据结构
我想知道是否有一种数据结构经过优化,可以针对以数据库表格式存储的数据计算频率。例如,数据采用下面的(逗号)分隔格式。
col1, col2, col3
x, a, green
x, b, blue
...
y, c, green
现在我只想计算 col1=x 或 col1=x 和 col2=green 的频率。我一直将数据存储在数据库表中,但在我的分析和经验观察中,数据库连接是瓶颈。我也尝试过使用内存数据库解决方案,效果很好;唯一的问题是内存需求和奇怪的初始化/销毁调用。
另外,我主要使用java,但有使用.net的经验,并且想知道是否有任何api可以使用java以linq方式处理“表格”数据。
任何帮助表示赞赏。
i was wondering if there is a data structure optimized to count frequencies against data that is stored in a database table-like format. for example, the data comes in a (comma) delimited format below.
col1, col2, col3
x, a, green
x, b, blue
...
y, c, green
now i simply want to count the frequency of col1=x or col1=x and col2=green. i have been storing the data in a database table, but in my profiling and from empirical observation, database connection is the bottle-neck. i have tried using in-memory database solutions too, and that works quite well; the only problem is memory requirements and quirky init/destroy calls.
also, i work mainly with java, but have experience with .net, and was wondering if there was any api to work with "tabular" data in a linq way using java.
any help is appreciated.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
嵌套的 TreeMap 怎么样?例如,假设您有以下记录:
您希望能够查询结构并询问“col1 具有值 v 多少次?”
我将使用以下代码将值插入到结构中:
How about a nested TreeMap? For example, say you have the following records:
You want to be able to query the structure and ask, "how many times did col1 have the value v?"
I'd use the following code to insert values into the structure:
有一个 Multiset 数据结构可以为您跟踪频率。以下是使用该数据结构的示例代码(来自 google-guava)。
需要注意的点。
(col1) 及其值 (x)(=)为
添加到时的分隔符
Multiset
检查频率 a
给定列中的特定值
There is a Multiset data structure that keeps track of the frequencies for you. Here is the sample code using that data structure (from google-guava).
Points to be noted.
(col1) and its value (x) with (=) as
the delimiter while adding to the
Multiset
check for the frequency a
particular value in a given column