如何在 O(n) 时间内根据 Map 中的整数值相对于其他值随机选择一个键?

发布于 2024-10-20 07:23:28 字数 330 浏览 6 评论 0原文

如果我们有一个 Map,那么我们可以说 Integer 值代表“有多少”T。因此,我想根据其 Integer 值统一选择一个 T。如果地图包含“a”=4 和“b”=6 的字符串,那么我希望 40% 的时间“a”被选择,60% 的时间“b”被选择。

最重要的是,我希望 O(n) 完成,在我之前的示例中,n 是(不是十)。我最初制作了一个 ArrayList,其中包含有多少个值的键(并简单地返回任何随机索引),但这个过程不仅非常慢,而且对于 Map的内容完全违反直觉。 > 代表。

If we have a Map<T, Integer>, let's say the Integer value represents "how many" Ts there are. Thus, I want to uniformly select a T based on its Integer value. If the map contains Strings with "a"=4 and "b"=6, then I want it so that 40% of the time "a" is selected and 60% of the time "b" is selected.

Most importantly, I'd like this in O(n), n being two (not ten) in my previous example. I originally made an ArrayList containing the keys by how many values it had (and simply returning any random index), but this process is not only very slow, but completely counterintuitive for what the Map<T, Integer> represents.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

南烟 2024-10-27 07:23:29

使用 arraylist 实际上比使用 Map 更快,因为您可以在 O(1) 内完成。

class RandVal<T> {

    List<T> list = new ArrayList<T>();
    Random rand = new Random();

    public T randomValue() {
        int next = rand.nextInt(list.size());
        return list.get(next);
    }

}

这是一件坏事的唯一方法是如果顺序很重要(AABBAB vs ABBABA 或其他什么),但很明显它不是因为你使用的是没有顺序的地图......

Using an arraylist would actually be even faster than using a Map, because you can do it in O(1).

class RandVal<T> {

    List<T> list = new ArrayList<T>();
    Random rand = new Random();

    public T randomValue() {
        int next = rand.nextInt(list.size());
        return list.get(next);
    }

}

The only way this is a bad thing is if order matters (A A B B A B vs A B B A B A or something) but it's obvious it doesn't because you're using a Map which has no ordering...

抽个烟儿 2024-10-27 07:23:29

在这里。

我想出了一个优雅的解决方案!对于任何误解:我最初的想法是通过 ArrayList 中的值的数量来存储所有键,完全忽视了使用 Map 来存储“使用整数的键实例”的意义;任何类似的解决方案都会适得其反!假设 Map 是无序的,这是我的解决方案:

public T randomPick(Random r) {

        int randomValue = r.nextInt(size());
        int currentSum = 0;
        T lastElement = null;

        for (T t : map.keySet()){
            if (randomValue < currentSum + map.get(t)){
                return t;
            }
            currentSum+= map.get(t);
            lastElement = t;
        }
        return lastElement;
    }

它将随机值当前总和+当前元素的值进行比较。如果小于该值,我们返回当前密钥。否则,继续并将该值添加到总和中。如果随机值永远不会小于任何值,我们将返回 lastElement

希望这能解决问题。

OP here.

I came up with an elegant solution! For any misunderstandings: My original idea of storing all the keys by how many values in an ArrayList was completely disregarding the point of using a Map to store "instances of the Key using Integers"; any similar solutions are counterproductive! Assuming the Map is unordered, here is my solution:

public T randomPick(Random r) {

        int randomValue = r.nextInt(size());
        int currentSum = 0;
        T lastElement = null;

        for (T t : map.keySet()){
            if (randomValue < currentSum + map.get(t)){
                return t;
            }
            currentSum+= map.get(t);
            lastElement = t;
        }
        return lastElement;
    }

It compares the random value with a current sum + the current element's value. If it is less than that, we return the current key. Else, keep going and add that value to the sum. If it is the case such that the random value is never less than any of the values, we return the lastElement.

Hope this clears it up.

浮光之海 2024-10-27 07:23:28

抱歉耽搁了,但我认为我有一个相对优雅的解决方案,具有 O(n lg n) 构造时间和 O(lg n) fetch-a-random-element时间。就这样吧。


加权概率图:
此类实现随机元素生成器。它是基于Iterable构建的;请参阅下面的Test.java

import java.util.Random;
import java.util.SortedMap;
import java.util.TreeMap;

class WeightedProbMap<EltType>  {
    private SortedMap<Integer, EltType> elts = new TreeMap<Integer, EltType>();
    private Random rand = new Random();
    private int sum = 0;

    // assume: each weight is > 0; there is at least one element;
    //         elements should not be repeated
    // ensure: this.elts maps cumulative weights to elements;
    //         this.sum is the total weight
    public WeightedProbMap(Iterable<Pair<Integer, EltType>> weights) {
        for (Pair<Integer, EltType> e : weights) {
            this.elts.put(this.sum, e.second);
            this.sum += e.first;
        }
    }

    // assume: this was initialized properly (cf. constructor req)
    // ensure: return an EltType with relative probability proportional
    //         to its associated weight
    public EltType nextElt() {
        int index = this.rand.nextInt(this.sum) + 1;
        SortedMap<Integer, EltType> view = this.elts.headMap(index);
        return view.get(view.lastKey());
    }
}

Pair.java: 只是一个简单的 Pair 类。

class Pair<X, Y> {
    public Pair(X x, Y y) {
        first = x;
        second = y;
    }

    X first;
    Y second;
}

Test.java:这是一个针对WeightedProbMap (WPM) 类的非常简单的测试工具。我们构建一个具有关联权重的元素的 ArrayList,使用它来构建 WPM,然后从 WPM 获取 10,000 个样本,以查看元素是否以预期频率出现。

import java.util.ArrayList;

class Test {
    public static void main(String argc[]) {
        ArrayList<Pair<Integer, String> > elts = new ArrayList<Pair<Integer, String>>();
        elts.add(new Pair<Integer, String>(20, "Hello"));
        // elts.add(new Pair<Integer, String>(70, "World"));
        // elts.add(new Pair<Integer, String>(10, "Ohai"));

        WeightedProbMap<String> wpm = new WeightedProbMap<String>(elts);

        for (int i = 0; i < 10000; ++i) {
            System.out.println(wpm.nextElt());
        }
    }
}

测试这一点:

  1. 取消注释 Test.java 中的一个或两个 elts.add(...) 行。
  2. 编译:

    $ javac Pair.java WeightedProbMap.java Test.java

  3. 运行with(例如,在 Unix 中):

    $ java 测试 | grep“你好”| wc -l

这将为您提供该特定执行的计数。


解释:

构造函数:
WeightedProbMap (WPM) 类使用 < code>java.util.SortedMap 将累积权重映射到元素。图形化解释:

The constructor takes weights...     ...and creates a mapping from the
      3 +---+                            number line:
        |   | 
  2 +---+   +---+ 2                   0      2         5      7
    |   |   |   |                     +------+---------+------+
    |   |   |   |                     |   X  |    Y    |   Z  |
  --+---+---+---+--                   +------+---------+------+
      X   Y   Z

nextElt()
SortedMap 按键顺序存储其数据,这使得它能够廉价地提供地图子集的“视图”。特别是,该行

SortedMap<Integer, EltType> view = this.elts.headMap(index)

返回原始映射 (this.elts) 的视图,其中仅包含严格小于 index 的键。此操作(headMap) 是常数时间:view 需要 O(1) 时间来构建,如果您要更改 this.elts稍后,这些更改也会反映在 view 中。

一旦我们创建了小于随机数的所有内容的视图,我们现在只需找到该子集中最大的密钥即可。我们使用 SortedMap.lastKey() 来实现这一点,对于 TreeMap 来说,应该花费 \Theta(lg n) 时间。

Sorry for the delay, but I think I have a relatively elegant solution with O(n lg n) construction time and O(lg n) fetch-a-random-element time. Here goes.


WeightedProbMap:
This class implements the random element generator. It is constructed based on an Iterable; see Test.java below.

import java.util.Random;
import java.util.SortedMap;
import java.util.TreeMap;

class WeightedProbMap<EltType>  {
    private SortedMap<Integer, EltType> elts = new TreeMap<Integer, EltType>();
    private Random rand = new Random();
    private int sum = 0;

    // assume: each weight is > 0; there is at least one element;
    //         elements should not be repeated
    // ensure: this.elts maps cumulative weights to elements;
    //         this.sum is the total weight
    public WeightedProbMap(Iterable<Pair<Integer, EltType>> weights) {
        for (Pair<Integer, EltType> e : weights) {
            this.elts.put(this.sum, e.second);
            this.sum += e.first;
        }
    }

    // assume: this was initialized properly (cf. constructor req)
    // ensure: return an EltType with relative probability proportional
    //         to its associated weight
    public EltType nextElt() {
        int index = this.rand.nextInt(this.sum) + 1;
        SortedMap<Integer, EltType> view = this.elts.headMap(index);
        return view.get(view.lastKey());
    }
}

Pair.java: Just a simple Pair class.

class Pair<X, Y> {
    public Pair(X x, Y y) {
        first = x;
        second = y;
    }

    X first;
    Y second;
}

Test.java: This is a very simple test harness for the WeightedProbMap (WPM) class. We build an ArrayList of elements with associated weights, use that to construct a WPM, and then get 10,000 samples from the WPM to see if elements appear with the expected frequency.

import java.util.ArrayList;

class Test {
    public static void main(String argc[]) {
        ArrayList<Pair<Integer, String> > elts = new ArrayList<Pair<Integer, String>>();
        elts.add(new Pair<Integer, String>(20, "Hello"));
        // elts.add(new Pair<Integer, String>(70, "World"));
        // elts.add(new Pair<Integer, String>(10, "Ohai"));

        WeightedProbMap<String> wpm = new WeightedProbMap<String>(elts);

        for (int i = 0; i < 10000; ++i) {
            System.out.println(wpm.nextElt());
        }
    }
}

Testing this:

  1. Uncomment one or both of the elts.add(...) lines in Test.java.
  2. Compile with:

    $ javac Pair.java WeightedProbMap.java Test.java

  3. Run with (for example, in Unix):

    $ java Test | grep "Hello" | wc -l

This will give you the count for that particular execution.


Explanation:

constructor:
The WeightedProbMap (WPM) class uses a java.util.SortedMap to map cumulative weights to elements. A graphical explanation:

The constructor takes weights...     ...and creates a mapping from the
      3 +---+                            number line:
        |   | 
  2 +---+   +---+ 2                   0      2         5      7
    |   |   |   |                     +------+---------+------+
    |   |   |   |                     |   X  |    Y    |   Z  |
  --+---+---+---+--                   +------+---------+------+
      X   Y   Z

nextElt():
A SortedMap stores its data by key order, which allows it to cheaply provide 'views' of subsets of the map. In particular, the line

SortedMap<Integer, EltType> view = this.elts.headMap(index)

returns a view of the original map (this.elts) with only the keys that are strictly smaller than index. This operation (headMap) is constant time: view takes O(1) time to construct, and if you were to change this.elts later on, the changes would be reflected in view as well.

Once we create the view of everything less than a random number, we now just have to find the greatest key in that subset. We do that with SortedMap.lastKey(), which, for a TreeMap, should take \Theta(lg n) time.

虫児飞 2024-10-27 07:23:28

为此,您必须缓存每个值 T 的相对频率。这将为您提供 O(n) 插入成本价格的 O(n) 概率分布(您必须更新每个 T 的相对频率)每次插入时)。

To do this, you have to cache the relative frequency of each value T. This gives you your O(n) probability-distribution for the price of an O(n) insertion-cost (you have to update the relative frequency of every T upon every insertion).

み零 2024-10-27 07:23:28

如果您可以存储总和,那么这很容易完成:

只需将对 (T, int) 作为类或普通数组中的任何内容存储,然后遍历它:

int val = Random.nextInt(total);
for (Pair p : pairs) {
    val -= p.val;
    if (val < 0) return p;
}

考虑到循环遍历 ArrayList 是不可能更快的这是迭代 n 个值的最有效方法,显然不能比 O(n) 做得更好。唯一的开销是 nextInt(),并且在每个解决方案中您都需要它(或类似的东西)。
根据您组织 ArrayList 的方式(排序与否),其他操作会变得更便宜/更昂贵,但这对于特定操作来说并不重要

编辑:尽管考虑一下“您显然需要 O(n)”是不正确的。如果您很少更改数组中的值,并且可以进行昂贵的准备工作并且内存不是问题,那么您可以通过存储 HashMap 做得更好。
例如,如果您有一个发行版:
T0:2
T1:3
T2: 1

您可以在哈希图中插入 (0, T0), (1, T0), (2, T1),.,(4, T1), (5, T2) 。

Edit2:或者参见 phooji 的方法,该方法对于较大的数据集应该是可行的。

If you can store the total sum, that's quite easily done:

Just store the pairs (T, int) as a class or whatever in an ordinary array and then go over it:

int val = Random.nextInt(total);
for (Pair p : pairs) {
    val -= p.val;
    if (val < 0) return p;
}

Can't get much faster considering that looping through an ArrayList is the most efficient way to iterate through n values and you can obviously not do better than O(n). The only overhead is the nextInt() and you need that (or something similar) as well in every solution.
Depending on how you organize the ArrayList (sorted or not) other operations get cheaper/more expensive, but it's unimportant for that particular action

Edit: Although thinking about it the "you obviously need O(n)" isn't true. If you change the values in the array rarely and can allow a expensive preparation and memory isn't a problem you can do better by storing a HashMap.
If you've got for example a distribution:
T0: 2
T1: 3
T2: 1

You could insert (0, T0), (1, T0), (2, T1),.,(4, T1), (5, T2) in the hashmap.

Edit2: Or see phooji's approach which should be feasible for larger sets of data.

秋心╮凉 2024-10-27 07:23:28

构建一个逆映射,Map,以便每个键都是迄今为止处理的所有权重的总和。

例如,如果你有这个映射:

T1 -> 10
T2 -> 8
T3 -> 3

这个逆映射是:(

10 -> T1
18 -> T2
21 -> T3

为了更好的性能,你可以先按降序排列你的权重。)

然后生成一个在 0 和所有权重之和之间均匀分布的随机数,并执行二分搜索对于逆映射的键集中的这个数字。

Build an inverse map, Map<Integer,T>so that every key is the sum of all the weights processed so far.

For example if you have this map:

T1 -> 10
T2 -> 8
T3 -> 3

This inverse map is:

10 -> T1
18 -> T2
21 -> T3

(For better performance, you can arrange your weights in descending order first.)

Then generate an evenly distributed random number between 0 and the sum of all weights, and perform a binary search for this number in the key set of the inverse map.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文