从加权列表中选择一个随机项目
我正在尝试编写一个程序,从 美国人口普查姓氏列表中选择一个随机姓名。列表格式是
Name Weight Cumulative line
----- ----- ----- -
SMITH 1.006 1.006 1
JOHNSON 0.810 1.816 2
WILLIAMS 0.699 2.515 3
JONES 0.621 3.136 4
BROWN 0.621 3.757 5
DAVIS 0.480 4.237 6
假设我将数据加载到一个结构中,例如
Class Name
{
public string Name {get; set;}
public decimal Weight {get; set;}
public decimal Cumulative {get; set;}
}
什么数据结构最适合保存名称列表,以及从列表中选择随机名称的最佳方法是什么,但名称的分布是和现实世界一样。
如果前 10,000 行对数据结构产生影响,我只会使用它。
我尝试过查看有关加权随机性的其他一些问题,但在将理论转化为代码时遇到了一些麻烦。我对数学理论了解不多,所以我不知道这是否是“有或没有替换”随机选择,我希望同一个名字能够多次出现,无论这意味着什么。
I am trying to write a program to select a random name from the US Census last name list. The list format is
Name Weight Cumulative line
----- ----- ----- -
SMITH 1.006 1.006 1
JOHNSON 0.810 1.816 2
WILLIAMS 0.699 2.515 3
JONES 0.621 3.136 4
BROWN 0.621 3.757 5
DAVIS 0.480 4.237 6
Assuming I load the data in to a structure like
Class Name
{
public string Name {get; set;}
public decimal Weight {get; set;}
public decimal Cumulative {get; set;}
}
What data structure would be best to hold the list of names, and what would be the best way to select a random name from the list but have the distribution of names be the same as the real world.
I will only be working with the first 10,000 rows if it makes a difference in the data structure.
I have tried looking at some of the other questions about weighted randomness but I am having a bit of trouble turning theory in to code. I do not know much about math theory so I do not know if this is a "With or without replacement" random selection, I want the same name able to show up more than once, which ever that one means.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
处理这个问题的“最简单”方法是将其保存在一个列表中。
然后您可以使用:
如果速度是一个问题,您可以存储一个仅包含
Culmitive
值的单独数组。这样,您可以使用 Array.BinarySearch 快速找到适当的索引:另一个选项(可能是最有效的)是使用类似 C5 通用集合库 的 树类。然后,您可以使用
RangeFrom
来查找适当的名称。这样做的好处是不需要单独收集The "easiest" way to handle this would be to keep this in a list.
You could then just use:
If speed is a concern, you could store a separate array of just the
Culmitive
values. With this, you could useArray.BinarySearch
to quickly find the appropriate index:Another option, which is probably the most efficient, would be to use something like one of the C5 Generic Collection Library's tree classes. You could then use
RangeFrom
to find the appropriate name. This has the advantage of not requiring a separate collection我创建了用于随机选择加权项的 C# 库。
一些示例代码:
I've created a C# library for randomly selected weighted items.
Some example code:
只是为了好玩,而且绝不是最佳选择
:
Just for fun, and in no way optimal
then:
我想说一个数组(如果你愿意,可以是向量)最好保存它们。至于加权平均值,求总和,在零和总和之间选择一个随机数,并选择累积值较小的姓氏。 (例如,这里,<1.006 = 史密斯,1.006-1.816 = 约翰逊,等等。
PS 它是累积的。
I'd say an array (vectors if you prefer) would be best to hold them. As for the weighted average, find the sum, pick a random number between zero and the sum, and pick the last name whose cumulative value is less. (e.g. here, <1.006 = smith, 1.006-1.816 = johnson, etc.
P.S. it's Cumulative.