用于以数据库表格格式计算频率的数据结构

发布于 2024-10-11 22:16:43 字数 368 浏览 1 评论 0原文

我想知道是否有一种数据结构经过优化,可以针对以数据库表格式存储的数据计算频率。例如,数据采用下面的(逗号)分隔格式。

col1, col2, col3
x, a, green
x, b, blue
...
y, c, green

现在我只想计算 col1=x 或 col1=x 和 col2=green 的频率。我一直将数据存储在数据库表中,但在我的分析和经验观察中,数据库连接是瓶颈。我也尝试过使用内存数据库解决方案,效果很好;唯一的问题是内存需求和奇怪的初始化/销毁调用。

另外,我主要使用java,但有使用.net的经验,并且想知道是否有任何api可以使用java以linq方式处理“表格”数据。

任何帮助表示赞赏。

i was wondering if there is a data structure optimized to count frequencies against data that is stored in a database table-like format. for example, the data comes in a (comma) delimited format below.

col1, col2, col3
x, a, green
x, b, blue
...
y, c, green

now i simply want to count the frequency of col1=x or col1=x and col2=green. i have been storing the data in a database table, but in my profiling and from empirical observation, database connection is the bottle-neck. i have tried using in-memory database solutions too, and that works quite well; the only problem is memory requirements and quirky init/destroy calls.

also, i work mainly with java, but have experience with .net, and was wondering if there was any api to work with "tabular" data in a linq way using java.

any help is appreciated.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

不念旧人 2024-10-18 22:16:43

嵌套的 TreeMap 怎么样?例如,假设您有以下记录:

col1=v, col2=v2
col1=v, col2=v3

您希望能够查询结构并询问“col1 具有值 v 多少次?”

我将使用以下代码将值插入到结构中:

TreeMap tm = new TreeMap();
//the map hasn't seen this column name yet
if(!tm.containsKey(columnName)){
    //mark the column value as being seen once
    tm.put(columnName, (new TreeMap()).put(colVal, 1));
}else{
    //the map has seen the column name.
    TreeMap valueMap = tm.get(columnName);
    if(valueMap.containsKey(colVal)){
        //we've seen this column value before.
        //Increment the number of times we've seen it
        int valCount = valueMap.get(colVal);
        valueMp.put(colVal, valCount++);
    }else{
        //we've have not seen this column value before.
        valueMap.put(colVal, 1);
    }
}

How about a nested TreeMap? For example, say you have the following records:

col1=v, col2=v2
col1=v, col2=v3

You want to be able to query the structure and ask, "how many times did col1 have the value v?"

I'd use the following code to insert values into the structure:

TreeMap tm = new TreeMap();
//the map hasn't seen this column name yet
if(!tm.containsKey(columnName)){
    //mark the column value as being seen once
    tm.put(columnName, (new TreeMap()).put(colVal, 1));
}else{
    //the map has seen the column name.
    TreeMap valueMap = tm.get(columnName);
    if(valueMap.containsKey(colVal)){
        //we've seen this column value before.
        //Increment the number of times we've seen it
        int valCount = valueMap.get(colVal);
        valueMp.put(colVal, valCount++);
    }else{
        //we've have not seen this column value before.
        valueMap.put(colVal, 1);
    }
}
樱娆 2024-10-18 22:16:43

有一个 Multiset 数据结构可以为您跟踪频率。以下是使用该数据结构的示例代码(来自 google-guava)。

void frequencyCounter()
{
    Multiset<String> counter = HashMultiset.create();

    counter.add("col1" + "=" + "x");
    counter.add("col2" + "=" + "x");
    counter.add("col2" + "=" + "x");

    System.out.println("how many times did col2 have the value x?");
    System.out.println(counter.count("col2" + "=" + "x"));
}

需要注意的点。

  • 我正在连接列名称
    (col1) 及其值 (x)(=)为
    添加到时的分隔符
    Multiset
  • 我正在重复相同的过程
    检查频率 a
    给定列中的特定值

There is a Multiset data structure that keeps track of the frequencies for you. Here is the sample code using that data structure (from google-guava).

void frequencyCounter()
{
    Multiset<String> counter = HashMultiset.create();

    counter.add("col1" + "=" + "x");
    counter.add("col2" + "=" + "x");
    counter.add("col2" + "=" + "x");

    System.out.println("how many times did col2 have the value x?");
    System.out.println(counter.count("col2" + "=" + "x"));
}

Points to be noted.

  • i am concatenating the column name
    (col1) and its value (x) with (=) as
    the delimiter while adding to the
    Multiset
  • I am repeating the same process to
    check for the frequency a
    particular value in a given column
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文