We don’t allow questions seeking recommendations for software libraries, tutorials, tools, books, or other off-site resources. You can edit the question so it can be answered with facts and citations.
Closed 8 months ago.
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
接受
或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
发布评论
评论(6)
我看到几个答案都指向“按有序序列行走”功能,这确实很重要,但没有一个答案强调另一个重要功能,即“查找带有键> =此的第一个条目”。 即使没有真正需要从那里“步行”,这也有很多用途。
例如(这在最近的 SO 答案中出现),假设您想生成具有给定相对频率的伪随机值 - 即,您给出了一个字典
d
:并且需要一种生成“狼”的方法,其概率为 100 中的 42(因为 100 是给定的相对频率的总和),“羊”的概率为 100 中的 15,等等; 并且不同值的数量可能非常大,相对频率也是如此。
然后,将给定值(以任何顺序)存储为树图中的值,相应的键是截至该点的“总累积频率”。 即:
现在,生成一个值可以非常快(
O(log(len(d)))
),如下所示:其中
firstGTKey
是返回第一个的方法条目(在此假设示例中具有.key
和.value
属性),其中 key > 给定的参数。 例如,我已将这种方法用于存储为 B 树的大文件(例如使用bsddb.bt_open
和set_location
方法)。I've seen several answers pointing to the "walk in ordered sequence" feature, which is indeed important, but none highlighting the other big feature, which is "find first entry with a key >= this". This has many uses even when there's no real need to "walk" from there.
For example (this came up in a recent SO answer), say you want to generate pseudo-random values with given relative frequencies -- i.e, you're given, say, a dict
d
:and need a way to generate 'wolf' with a probability of 42 out of 100 (since 100 is the total of the relative frequencies given), 'sheep' 15 out of 100, and so on; and the number of distinct values can be quite large, as can the relative frequencies.
Then, store the given values (in whatever order) as the values in a tree map, with the corresponding keys being the "total cumulative frequency" up to that point. I.e.:
Now, generating a value can be pretty fast (
O(log(len(d)))
), as follows:where
firstGTKey
is a method that returns the first entry (with.key
and.value
attributes, in this hypothetical example) with a key > the given argument. I've used this approach with large files stored as B-Trees, for example (using e.g.bsddb.bt_open
and theset_location
method).当您需要按键顺序浏览字典时,它非常有用; 有时会出现。 事实上,我发现它在某些编程竞赛中比其他任何竞赛(例如 ACM 等)更常见。
TreeMap 最有用的功能是当您想要快速找到最小或最大键时; 使用排序字典,这通常是单个方法调用; 并且在算法上可以在 O(log(n)) 时间内完成,而不是在集合未排序的情况下迭代每个键以查找最小值/最大值。 基本上,一个更友好的界面。
我遇到的最常见的情况之一是当对象由特定名称标识时,并且您想要打印出根据名称排序的对象; 比如说从目录名称到目录中文件数量的映射。
我使用它的另一个地方是在 Excel 电子表格包装中; 从行号到行对象的映射。 这使您可以快速找到最后一行索引,而无需循环遍历每一行。
此外,当您可以轻松地定义键上的比较关系(但不一定是 HashMap 所需的哈希函数)时,它也很有用。 我能想到的最好的(虽然很弱)的例子是不区分大小写的字符串键。
It's useful when you need to go through a Dictionary in order of the keys; which comes up on occasion. I've actually found its infinitely more common in certain programming contests then anything else (think ACM, etc).
The most useful feature of a TreeMap is when you want to quickly find the min or max key; using a sorted dictionary this is often a single method call; and algorithmically can be done in O(log(n)) time, as opposed to iterating over each key looking for a min/max if the collection is unsorted. Basically, a much friendlier interface.
One of the more common times I run into it is when objects are identified by a specific name, and you want to print out the objects ordered according to the name; say a mapping from directory name to number of files in a directory.
One other place I've used it is in an excel spreadsheet wrapper; mapping from row number to row object. This lets you quickly find the last row index, without looping through each row.
Also, it's useful when you can easily define a comparison relation on keys, but not necessarily a hashing function, as needed for HashMaps. The best (though weak) example I can think of is case insensitive string keys.
保持元素排序的原因是为了更快地检索。 假设我希望字典中的所有值都在排序范围内。 使用 TreeDict 比使用常规哈希图要快得多。 它基本上允许您按排序顺序保留字典中的所有内容。 我知道在我当前正在开发的应用程序中使用这样的类来基本上查询数据结构。
The reason for keeping the elements in sorted order is for faster retrieval. Say I wanted all of the values in the dictionary in a sorted range. This is much faster with a TreeDict then with the regular hashmap. It basically allows you to keep everything in the dictionary in sorted order. I know in the application I'm currently working on uses a class like this to basically query the data structure.
在处理工业过程数据时,我经常使用
Dict
--阀门打开/关闭、机械启动/停止等。
当我需要在相当长的时间内比较启动/停止或打开/关闭事件之间的时间间隔时,对键进行排序特别有用。
然而,自从我能够在 C# 中使用 linq 以来,我发现使用 IEnumerables 并使用 IQueryable 扩展方法来获取我需要的信息通常更容易。
I often use
Dict<DateTime, someClassOrValue>
when working with industrial process data--Valve open/close, machinery start/stop, etc.
Having the keys sorted is especially useful when I need to compare time intervals between start/stop or open/close events in a decent amount of time.
However, since I've been able to use linq in C# I've found that it's often easier to just work with IEnumerables and use the IQueryable extension methods to get the information I need.
几乎所有“GROUP BY”报告都需要排序的字典。
这在数据仓库应用程序中经常执行,因此很难表达它的重要性。
如果
sorted
函数调用不起作用,从长远来看,它可以节省大量时间。Almost all "GROUP BY" reporting requires a sorted dictionary.
This is done so often in data warehousing applications, that it's difficult to express how central this is.
If the
sorted
function call does no work, it saves a ton of time in the long run.它们可以使各种算法更容易实现。
They can make various algorithms easier to implement.