“TreeDict”可以做什么? (或树状图)在实践中有何用途?

发布于 2024-07-25 03:01:59 字数 1543 浏览 13 评论 0原文

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

隔岸观火 2024-08-01 03:01:59

我看到几个答案都指向“按有序序列行走”功能,这确实很重要,但没有一个答案强调另一个重要功能,即“查找带有键> =此的第一个条目”。 即使没有真正需要从那里“步行”,这也有很多用途。

例如(这在最近的 SO 答案中出现),假设您想生成具有给定相对频率的伪随机值 - 即,您给出了一个字典 d:

{'wolf': 42, 'sheep': 15, 'dog': 23, 'goat': 15, 'cat': 5}

并且需要一种生成“狼”的方法,其概率为 100 中的 42(因为 100 是给定的相对频率的总和),“羊”的概率为 100 中的 15,等等; 并且不同值的数量可能非常大,相对频率也是如此。

然后,将给定值(以任何顺序)存储为树图中的值,相应的键是截至该点的“总累积频率”。 即:

def preprocess(d):
    tot = 0
    for v in d:
        tot += d[v]
        treemap.insert(key=tot, value=v)
    return tot, treemap

现在,生成一个值可以非常快(O(log(len(d)))),如下所示:

def generate(tot, treemap, r=random):
    n = r.randrange(tot)
    return treemap.firstGTkey(n).value

其中 firstGTKey 是返回第一个的方法条目(在此假设示例中具有 .key.value 属性),其中 key > 给定的参数。 例如,我已将这种方法用于存储为 B 树的大文件(例如使用 bsddb.bt_openset_location 方法)。

I've seen several answers pointing to the "walk in ordered sequence" feature, which is indeed important, but none highlighting the other big feature, which is "find first entry with a key >= this". This has many uses even when there's no real need to "walk" from there.

For example (this came up in a recent SO answer), say you want to generate pseudo-random values with given relative frequencies -- i.e, you're given, say, a dict d:

{'wolf': 42, 'sheep': 15, 'dog': 23, 'goat': 15, 'cat': 5}

and need a way to generate 'wolf' with a probability of 42 out of 100 (since 100 is the total of the relative frequencies given), 'sheep' 15 out of 100, and so on; and the number of distinct values can be quite large, as can the relative frequencies.

Then, store the given values (in whatever order) as the values in a tree map, with the corresponding keys being the "total cumulative frequency" up to that point. I.e.:

def preprocess(d):
    tot = 0
    for v in d:
        tot += d[v]
        treemap.insert(key=tot, value=v)
    return tot, treemap

Now, generating a value can be pretty fast (O(log(len(d)))), as follows:

def generate(tot, treemap, r=random):
    n = r.randrange(tot)
    return treemap.firstGTkey(n).value

where firstGTKey is a method that returns the first entry (with .key and .value attributes, in this hypothetical example) with a key > the given argument. I've used this approach with large files stored as B-Trees, for example (using e.g. bsddb.bt_open and the set_location method).

未央 2024-08-01 03:01:59

当您需要按键顺序浏览字典时,它非常有用; 有时会出现。 事实上,我发现它在某些编程竞赛中比其他任何竞赛(例如 ACM 等)更常见。

TreeMap 最有用的功能是当您想要快速找到最小或最大键时; 使用排序字典,这通常是单个方法调用; 并且在算法上可以在 O(log(n)) 时间内完成,而不是在集合未排序的情况下迭代每个键以查找最小值/最大值。 基本上,一个更友好的界面。

我遇到的最常见的情况之一是当对象由特定名称标识时,并且您想要打印出根据名称排序的对象; 比如说从目录名称到目录中文件数量的映射。

我使用它的另一个地方是在 Excel 电子表格包装中; 从行号到行对象的映射。 这使您可以快速找到最后一行索引,而无需循环遍历每一行。

此外,当您可以轻松地定义键上的比较关系(但不一定是 HashMap 所需的哈希函数)时,它也很有用。 我能想到的最好的(虽然很弱)的例子是不区分大小写的字符串键。

It's useful when you need to go through a Dictionary in order of the keys; which comes up on occasion. I've actually found its infinitely more common in certain programming contests then anything else (think ACM, etc).

The most useful feature of a TreeMap is when you want to quickly find the min or max key; using a sorted dictionary this is often a single method call; and algorithmically can be done in O(log(n)) time, as opposed to iterating over each key looking for a min/max if the collection is unsorted. Basically, a much friendlier interface.

One of the more common times I run into it is when objects are identified by a specific name, and you want to print out the objects ordered according to the name; say a mapping from directory name to number of files in a directory.

One other place I've used it is in an excel spreadsheet wrapper; mapping from row number to row object. This lets you quickly find the last row index, without looping through each row.

Also, it's useful when you can easily define a comparison relation on keys, but not necessarily a hashing function, as needed for HashMaps. The best (though weak) example I can think of is case insensitive string keys.

記憶穿過時間隧道 2024-08-01 03:01:59

保持元素排序的原因是为了更快地检索。 假设我希望字典中的所有值都在排序范围内。 使用 TreeDict 比使用常规哈希图要快得多。 它基本上允许您按排序顺序保留字典中的所有内容。 我知道在我当前正在开发的应用程序中使用这样的类来基本上查询数据结构。

The reason for keeping the elements in sorted order is for faster retrieval. Say I wanted all of the values in the dictionary in a sorted range. This is much faster with a TreeDict then with the regular hashmap. It basically allows you to keep everything in the dictionary in sorted order. I know in the application I'm currently working on uses a class like this to basically query the data structure.

向日葵 2024-08-01 03:01:59

在处理工业过程数据时,我经常使用 Dict--
阀门打开/关闭、机械启动/停止等。

当我需要在相当长的时间内比较启动/停止或打开/关闭事件之间的时间间隔时,对键进行排序特别有用。

然而,自从我能够在 C# 中使用 linq 以来,我发现使用 IEnumerables 并使用 IQueryable 扩展方法来获取我需要的信息通常更容易。

I often use Dict<DateTime, someClassOrValue> when working with industrial process data--
Valve open/close, machinery start/stop, etc.

Having the keys sorted is especially useful when I need to compare time intervals between start/stop or open/close events in a decent amount of time.

However, since I've been able to use linq in C# I've found that it's often easier to just work with IEnumerables and use the IQueryable extension methods to get the information I need.

用心笑 2024-08-01 03:01:59

几乎所有“GROUP BY”报告都需要排序的字典。

summary = sortedDefaultDict()
for row in somePileOfData:
    summary[row.group_by] += row.balance
for k in sorted(summary.keys()):
    print k, summary[k]

这在数据仓库应用程序中经常执行,因此很难表达它的重要性。

如果 sorted 函数调用不起作用,从长远来看,它可以节省大量时间。

Almost all "GROUP BY" reporting requires a sorted dictionary.

summary = sortedDefaultDict()
for row in somePileOfData:
    summary[row.group_by] += row.balance
for k in sorted(summary.keys()):
    print k, summary[k]

This is done so often in data warehousing applications, that it's difficult to express how central this is.

If the sorted function call does no work, it saves a ton of time in the long run.

明明#如月 2024-08-01 03:01:59

它们可以使各种算法更容易实现。

They can make various algorithms easier to implement.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文