将数据存储到哈希中以实现灵活的“数据透视表”的最佳方式喜欢计算

发布于 2024-11-01 20:54:55 字数 1031 浏览 0 评论 0原文

我有一个包含以下字段的数据集。

主机名、型号、位置、端口号、激活?、启动?

我会将它们转换为哈希结构(可能类似于下面)

my %switches = (
    a => {
        "hostname" => "SwitchA",
        "model" => "3750",
        "location" => "Building A"
        "total_ports" => 48,
        "configured_ports" => 30,
        "used_ports" => 24,
    },
    b => {
        "hostname" => "SwitchB",
        "model" => "3560",
        "location" => "Building B"
        "total_ports" => 48,
        "configured_ports" => 36,
        "used_ports" => 20,
    },        
},
);

最后我想生成统计信息,例如: 每栋大楼的交换机数量, 每栋楼各型号交换机数量 总数每栋建筑的上行端口数

统计数据可能不仅仅限于建筑方面,甚至可能基于交换机(即,95% 使用的交换机数量等)。使用给定的数据结构,我如何枚举这些计数器?

相反,有没有更好的方法来存储我的数据?我至少可以想到一种格式:

 <while iterating over records>
 {
 hash{$location}->{$model_name}->count++;
 if ($State eq 'Active') {hash{$location}->{up_ports}->count++};

更好的方法是什么?如果我选择第一种格式(散列中的所有信息都完整无缺),我如何混合数据以产生不同的统计数据? (一些示例代码片段会有很大帮助!)

I have a data set with following fields.

host name, model, location, port number, activated?, up?

I would convert them into a hash structure (perhaps similar to below)

my %switches = (
    a => {
        "hostname" => "SwitchA",
        "model" => "3750",
        "location" => "Building A"
        "total_ports" => 48,
        "configured_ports" => 30,
        "used_ports" => 24,
    },
    b => {
        "hostname" => "SwitchB",
        "model" => "3560",
        "location" => "Building B"
        "total_ports" => 48,
        "configured_ports" => 36,
        "used_ports" => 20,
    },        
},
);

In the end I want to generate statistics such as:
No. of switches per building,
No. of switches of each model per building
Total no. of up ports per building

The statistics may not be just restricted to building wise, may be even switch based (i.e, no. of switches 95% used etc.,). With the given data structure how can I enumerate those counters?

Conversely, is there a better way to store my data? I can think of at least one format:

 <while iterating over records>
 {
 hash{$location}->{$model_name}->count++;
 if ($State eq 'Active') {hash{$location}->{up_ports}->count++};

What would be the better way to go about this? If I chose the first format (where all information is intact inside the hash) how can I mash the data to produce different statistics? (some example code snippets would be of great help!)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

蓝梦月影 2024-11-08 20:54:55

如果您想要查询灵活性,“数据库”策略通常是不错的选择。您可以直接将数据放入 SQLite 中来执行此操作。在这种方法下,您将能够对数据发出各种查询,而无需自己编写太多代码。

或者,如果您正在寻找纯 Perl 方法,近似数据库表的方法是使用数组数组,或者为了代码可读性更好,使用哈希数组。外部数组就像数据库表。该数组中的每个哈希就像一个数据库记录。基于 Perl 的查询最终将如下所示:

my @query_result = grep {
    $_->{foo} == 1234 and
    $_->{bar} eq 'fubb'
} @data;

如果您有太多行以致于查询性能成为瓶颈,您可以使用哈希创建自己的索引。

%data_by_switch = (
    'SwitchA' => [0, 4, 13, ...],  # Subscripts to @data.
    'SwitchB' => [1, 12, ...],
    ...
);

我的回答基于我收到的这个问题的答案,它与您的问题有一些相似之处。

If you want querying flexibility, a "database" strategy is often good. You can do that directly, by putting the data into something like SQLite. Under that approach, you would be able to issue a wide variety of queries against the data without much coding of your own.

Alternatively, if you're looking for a pure Perl approach, the way to approximate a database table is by using an array-of-arrays or, even better for code readability, an array-of-hashes. The outer array is like the database table. Each hash within that array is like a database record. Your Perl-based queries would end up looking like this:

my @query_result = grep {
    $_->{foo} == 1234 and
    $_->{bar} eq 'fubb'
} @data;

If you have so many rows that query performance becomes a bottleneck, you can create your own indexes, using a hash.

%data_by_switch = (
    'SwitchA' => [0, 4, 13, ...],  # Subscripts to @data.
    'SwitchB' => [1, 12, ...],
    ...
);

My answer is based on answers I received for this question, which has some similarities with your question.

别忘他 2024-11-08 20:54:55

据我所知,您有一个元组列表,为了讨论的目的,考虑具有 2 个属性的对象就足够了,例如 location 和 ports_used。因此,例如:

(["locA", 23], ["locB", 42], ["locA", 13])  # just the values as tuples, no keys

您想要这样的结果:

("locA" => 36, "locB" => 42)

这是正确的吗?如果是这样,您面临的问题是什么?

As far as I can see you have a list of tuples, for the sake of the discussion it is enough to consider objects with 2 attributes, for example location and ports_used. So, for example:

(["locA", 23], ["locB", 42], ["locA", 13])  # just the values as tuples, no keys

And you want a result like:

("locA" => 36, "locB" => 42)

Is this correct? If so, what is the problem you are facing?

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文