如何在 O(n) 时间内根据 Map 中的整数值相对于其他值随机选择一个键?
如果我们有一个 Map
,那么我们可以说 Integer 值代表“有多少”T。因此,我想根据其 Integer 值统一选择一个 T。如果地图包含“a”=4 和“b”=6 的字符串,那么我希望 40% 的时间“a”被选择,60% 的时间“b”被选择。
最重要的是,我希望 O(n) 完成,在我之前的示例中,n 是二(不是十)。我最初制作了一个 ArrayList,其中包含有多少个值的键(并简单地返回任何随机索引),但这个过程不仅非常慢,而且对于 Map
If we have a Map<T, Integer>
, let's say the Integer value represents "how many" Ts there are. Thus, I want to uniformly select a T based on its Integer value. If the map contains Strings with "a"=4 and "b"=6, then I want it so that 40% of the time "a" is selected and 60% of the time "b" is selected.
Most importantly, I'd like this in O(n), n being two (not ten) in my previous example. I originally made an ArrayList containing the keys by how many values it had (and simply returning any random index), but this process is not only very slow, but completely counterintuitive for what the Map<T, Integer>
represents.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
使用 arraylist 实际上比使用 Map 更快,因为您可以在 O(1) 内完成。
这是一件坏事的唯一方法是如果顺序很重要(AABBAB vs ABBABA 或其他什么),但很明显它不是因为你使用的是没有顺序的地图......
Using an arraylist would actually be even faster than using a Map, because you can do it in O(1).
The only way this is a bad thing is if order matters (A A B B A B vs A B B A B A or something) but it's obvious it doesn't because you're using a Map which has no ordering...
在这里。
我想出了一个优雅的解决方案!对于任何误解:我最初的想法是通过 ArrayList 中的值的数量来存储所有键,完全忽视了使用 Map 来存储“使用整数的键实例”的意义;任何类似的解决方案都会适得其反!假设 Map 是无序的,这是我的解决方案:
它将
随机值
与当前总和+当前元素的值
进行比较。如果小于该值,我们返回当前密钥。否则,继续并将该值添加到总和中。如果随机值永远不会小于任何值,我们将返回lastElement
。希望这能解决问题。
OP here.
I came up with an elegant solution! For any misunderstandings: My original idea of storing all the keys by how many values in an ArrayList was completely disregarding the point of using a Map to store "instances of the Key using Integers"; any similar solutions are counterproductive! Assuming the Map is unordered, here is my solution:
It compares the
random value
with acurrent sum + the current element's value
. If it is less than that, we return the current key. Else, keep going and add that value to the sum. If it is the case such that the random value is never less than any of the values, we return thelastElement
.Hope this clears it up.
抱歉耽搁了,但我认为我有一个相对优雅的解决方案,具有
O(n lg n)
构造时间和O(lg n)
fetch-a-random-element时间。就这样吧。加权概率图:
此类实现随机元素生成器。它是基于
Iterable
构建的;请参阅下面的Test.java
。Pair.java: 只是一个简单的 Pair 类。
Test.java:这是一个针对
WeightedProbMap
(WPM) 类的非常简单的测试工具。我们构建一个具有关联权重的元素的 ArrayList,使用它来构建 WPM,然后从 WPM 获取 10,000 个样本,以查看元素是否以预期频率出现。测试这一点:
Test.java
中的一个或两个elts.add(...)
行。编译:
$ javac Pair.java WeightedProbMap.java Test.java
运行with(例如,在 Unix 中):
$ java 测试 | grep“你好”| wc -l
这将为您提供该特定执行的计数。
解释:
构造函数:
WeightedProbMap
(WPM) 类使用 < code>java.util.SortedMap 将累积权重映射到元素。图形化解释:nextElt()
:SortedMap
按键顺序存储其数据,这使得它能够廉价地提供地图子集的“视图”。特别是,该行返回原始映射 (
this.elts
) 的视图,其中仅包含严格小于index
的键。此操作(headMap
) 是常数时间:view
需要O(1)
时间来构建,如果您要更改this.elts
稍后,这些更改也会反映在view
中。一旦我们创建了小于随机数的所有内容的
视图
,我们现在只需找到该子集中最大的密钥即可。我们使用SortedMap.lastKey()
来实现这一点,对于TreeMap
来说,应该花费\Theta(lg n)
时间。Sorry for the delay, but I think I have a relatively elegant solution with
O(n lg n)
construction time andO(lg n)
fetch-a-random-element time. Here goes.WeightedProbMap:
This class implements the random element generator. It is constructed based on an
Iterable
; seeTest.java
below.Pair.java: Just a simple Pair class.
Test.java: This is a very simple test harness for the
WeightedProbMap
(WPM) class. We build an ArrayList of elements with associated weights, use that to construct a WPM, and then get 10,000 samples from the WPM to see if elements appear with the expected frequency.Testing this:
elts.add(...)
lines inTest.java
.Compile with:
$ javac Pair.java WeightedProbMap.java Test.java
Run with (for example, in Unix):
$ java Test | grep "Hello" | wc -l
This will give you the count for that particular execution.
Explanation:
constructor:
The
WeightedProbMap
(WPM) class uses ajava.util.SortedMap
to map cumulative weights to elements. A graphical explanation:nextElt()
:A
SortedMap
stores its data by key order, which allows it to cheaply provide 'views' of subsets of the map. In particular, the linereturns a view of the original map (
this.elts
) with only the keys that are strictly smaller thanindex
. This operation (headMap
) is constant time:view
takesO(1)
time to construct, and if you were to changethis.elts
later on, the changes would be reflected inview
as well.Once we create the
view
of everything less than a random number, we now just have to find the greatest key in that subset. We do that withSortedMap.lastKey()
, which, for aTreeMap
, should take\Theta(lg n)
time.为此,您必须缓存每个值 T 的相对频率。这将为您提供 O(n) 插入成本价格的 O(n) 概率分布(您必须更新每个 T 的相对频率)每次插入时)。
To do this, you have to cache the relative frequency of each value T. This gives you your O(n) probability-distribution for the price of an O(n) insertion-cost (you have to update the relative frequency of every T upon every insertion).
如果您可以存储总和,那么这很容易完成:
只需将对 (T, int) 作为类或普通数组中的任何内容存储,然后遍历它:
考虑到循环遍历 ArrayList 是不可能更快的这是迭代 n 个值的最有效方法,显然不能比 O(n) 做得更好。唯一的开销是 nextInt(),并且在每个解决方案中您都需要它(或类似的东西)。
根据您组织 ArrayList 的方式(排序与否),其他操作会变得更便宜/更昂贵,但这对于特定操作来说并不重要
编辑:尽管考虑一下“您显然需要 O(n)”是不正确的。如果您很少更改数组中的值,并且可以进行昂贵的准备工作并且内存不是问题,那么您可以通过存储 HashMap 做得更好。
例如,如果您有一个发行版:
T0:2
T1:3
T2: 1
您可以在哈希图中插入 (0, T0), (1, T0), (2, T1),.,(4, T1), (5, T2) 。
Edit2:或者参见 phooji 的方法,该方法对于较大的数据集应该是可行的。
If you can store the total sum, that's quite easily done:
Just store the pairs (T, int) as a class or whatever in an ordinary array and then go over it:
Can't get much faster considering that looping through an ArrayList is the most efficient way to iterate through n values and you can obviously not do better than O(n). The only overhead is the nextInt() and you need that (or something similar) as well in every solution.
Depending on how you organize the ArrayList (sorted or not) other operations get cheaper/more expensive, but it's unimportant for that particular action
Edit: Although thinking about it the "you obviously need O(n)" isn't true. If you change the values in the array rarely and can allow a expensive preparation and memory isn't a problem you can do better by storing a HashMap.
If you've got for example a distribution:
T0: 2
T1: 3
T2: 1
You could insert (0, T0), (1, T0), (2, T1),.,(4, T1), (5, T2) in the hashmap.
Edit2: Or see phooji's approach which should be feasible for larger sets of data.
构建一个逆映射,
Map
,以便每个键都是迄今为止处理的所有权重的总和。例如,如果你有这个映射:
这个逆映射是:(
为了更好的性能,你可以先按降序排列你的权重。)
然后生成一个在 0 和所有权重之和之间均匀分布的随机数,并执行二分搜索对于逆映射的键集中的这个数字。
Build an inverse map,
Map<Integer,T>
so that every key is the sum of all the weights processed so far.For example if you have this map:
This inverse map is:
(For better performance, you can arrange your weights in descending order first.)
Then generate an evenly distributed random number between 0 and the sum of all weights, and perform a binary search for this number in the key set of the inverse map.