从加权列表中选择一个随机项目

发布于 2024-12-04 00:14:39 字数 826 浏览 3 评论 0原文

我正在尝试编写一个程序,从 美国人口普查姓氏列表中选择一个随机姓名。列表格式是

Name           Weight Cumulative line
-----          -----  -----      -
SMITH          1.006  1.006      1
JOHNSON        0.810  1.816      2
WILLIAMS       0.699  2.515      3
JONES          0.621  3.136      4
BROWN          0.621  3.757      5
DAVIS          0.480  4.237      6

假设我将数据加载到一个结构中,例如

Class Name
{
    public string Name {get; set;}
    public decimal Weight {get; set;}
    public decimal Cumulative {get; set;}
}

什么数据结构最适合保存名称列表,以及从列表中选择随机名称的最佳方法是什么,但名称的分布是和现实世界一样。

如果前 10,000 行对数据结构产生影响,我只会使用它。

我尝试过查看有关加权随机性的其他一些问题,但在将理论转化为代码时遇到了一些麻烦。我对数学理论了解不多,所以我不知道这是否是“有或没有替换”随机选择,我希望同一个名字能够多次出现,无论这意味着什么。

I am trying to write a program to select a random name from the US Census last name list. The list format is

Name           Weight Cumulative line
-----          -----  -----      -
SMITH          1.006  1.006      1
JOHNSON        0.810  1.816      2
WILLIAMS       0.699  2.515      3
JONES          0.621  3.136      4
BROWN          0.621  3.757      5
DAVIS          0.480  4.237      6

Assuming I load the data in to a structure like

Class Name
{
    public string Name {get; set;}
    public decimal Weight {get; set;}
    public decimal Cumulative {get; set;}
}

What data structure would be best to hold the list of names, and what would be the best way to select a random name from the list but have the distribution of names be the same as the real world.

I will only be working with the first 10,000 rows if it makes a difference in the data structure.

I have tried looking at some of the other questions about weighted randomness but I am having a bit of trouble turning theory in to code. I do not know much about math theory so I do not know if this is a "With or without replacement" random selection, I want the same name able to show up more than once, which ever that one means.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

风筝有风,海豚有海 2024-12-11 00:14:39

处理这个问题的“最简单”方法是将其保存在一个列表中。

然后您可以使用:

Name GetRandomName(Random random, List<Name> names)
{
    double value = random.NextDouble() * names[names.Count-1].Culmitive;
    return names.Last(name => name.Culmitive <= value);
}

如果速度是一个问题,您可以存储一个仅包含 Culmitive 值的单独数组。这样,您可以使用 Array.BinarySearch 快速找到适当的索引:

Name GetRandomName(Random random, List<Name> names, double[] culmitiveValues)
{
    double value = random.NextDouble() * names[names.Count-1].Culmitive;
    int index = Array.BinarySearch(culmitiveValues, value);
    if (index >= 0)
        index = ~index;

    return names[index];
}

另一个选项(可能是最有效的)是使用类似 C5 通用集合库树类。然后,您可以使用 RangeFrom 来查找适当的名称。这样做的好处是不需要单独收集

The "easiest" way to handle this would be to keep this in a list.

You could then just use:

Name GetRandomName(Random random, List<Name> names)
{
    double value = random.NextDouble() * names[names.Count-1].Culmitive;
    return names.Last(name => name.Culmitive <= value);
}

If speed is a concern, you could store a separate array of just the Culmitive values. With this, you could use Array.BinarySearch to quickly find the appropriate index:

Name GetRandomName(Random random, List<Name> names, double[] culmitiveValues)
{
    double value = random.NextDouble() * names[names.Count-1].Culmitive;
    int index = Array.BinarySearch(culmitiveValues, value);
    if (index >= 0)
        index = ~index;

    return names[index];
}

Another option, which is probably the most efficient, would be to use something like one of the C5 Generic Collection Library's tree classes. You could then use RangeFrom to find the appropriate name. This has the advantage of not requiring a separate collection

谁与争疯 2024-12-11 00:14:39

我创建了用于随机选择加权项的 C# 库

  • 它实现了树选择和 walker 别名方法算法,为所有用例提供最佳性能。
  • 它经过单元测试和优化。
  • 它有 LINQ 支持。
  • 它是免费且开源的,并根据 MIT 许可证获得许可。

一些示例代码:

IWeightedRandomizer<string> randomizer = new DynamicWeightedRandomizer<string>();
randomizer["Joe"] = 1;
randomizer["Ryan"] = 2;
randomizer["Jason"] = 2;

string name1 = randomizer.RandomWithReplacement();
//name1 has a 20% chance of being "Joe", 40% of "Ryan", 40% of "Jason"

string name2 = randomizer.RandomWithRemoval();
//Same as above, except whichever one was chosen has been removed from the list.

I've created a C# library for randomly selected weighted items.

  • It implements both the tree-selection and walker alias method algorithms, to give the best performance for all use-cases.
  • It is unit-tested and optimized.
  • It has LINQ support.
  • It's free and open-source, licensed under the MIT license.

Some example code:

IWeightedRandomizer<string> randomizer = new DynamicWeightedRandomizer<string>();
randomizer["Joe"] = 1;
randomizer["Ryan"] = 2;
randomizer["Jason"] = 2;

string name1 = randomizer.RandomWithReplacement();
//name1 has a 20% chance of being "Joe", 40% of "Ryan", 40% of "Jason"

string name2 = randomizer.RandomWithRemoval();
//Same as above, except whichever one was chosen has been removed from the list.
泼猴你往哪里跑 2024-12-11 00:14:39

只是为了好玩,而且绝不是最佳选择

List<Name> Names = //Load your structure into this

List<String> NameBank = new List<String>();
foreach(Name name in Names)
   for(int i = 0; i <= (int)(name.Weight*1000); i++)
     NameBank.Add(name.Name)

String output = NameBank[rand(NameBank.Count)];

Just for fun, and in no way optimal

List<Name> Names = //Load your structure into this

List<String> NameBank = new List<String>();
foreach(Name name in Names)
   for(int i = 0; i <= (int)(name.Weight*1000); i++)
     NameBank.Add(name.Name)

then:

String output = NameBank[rand(NameBank.Count)];
枯寂 2024-12-11 00:14:39

我想说一个数组(如果你愿意,可以是向量)最好保存它们。至于加权平均值,求总和,在零和总和之间选择一个随机数,并选择累积值较小的姓氏。 (例如,这里,<1.006 = 史密斯,1.006-1.816 = 约翰逊,等等。

PS 它是累积的。

I'd say an array (vectors if you prefer) would be best to hold them. As for the weighted average, find the sum, pick a random number between zero and the sum, and pick the last name whose cumulative value is less. (e.g. here, <1.006 = smith, 1.006-1.816 = johnson, etc.

P.S. it's Cumulative.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文