从字典概率明智的挑选

发布于 2025-02-06 08:01:12 字数 1116 浏览 5 评论 0原文

假设我

{'us': 
     {'male': 
            {'given_names': 
                          ['Alex', 'Bob', 'Charlie'] 
            }, 
      'female': 
            {'given_names': 
                          ['Alice', 'Betty', 'Claire'] 
            } 
      },

'uk': 
     {'male': 
            {'given_names': 
                          ['aaa', 'Bbb', 'cc'] 
            }, 
      'female': 
            {'given_names': 
                          ['ppp', 'ddd', 'sss'] 
            } 
      }

}

现在有一个词典，假设我想获得60％的美国名字，40％的英国名字，但有50％的男性和女性的名字。

我该怎么做？

当前的方法？试图思考类似于但是我想这比那更复杂。

我当时想先获取所有名称，然后从中应用分发？但这并不是一定的逻辑意义。有人可以帮忙吗？

        # all_possible_names = [
        #     name
        #     for list_of_names in [
        #         self.library[area][gender][
        #             "given_names"
        #         ]
        #         for gender in self.genders
        #         for area in self.name_areas
        #     ]
        #     for name in list_of_names
        # ]
        # print(all_possible_names) `

谢谢。

原文

Let's say I have a dictionary

{'us': 
     {'male': 
            {'given_names': 
                          ['Alex', 'Bob', 'Charlie'] 
            }, 
      'female': 
            {'given_names': 
                          ['Alice', 'Betty', 'Claire'] 
            } 
      },

'uk': 
     {'male': 
            {'given_names': 
                          ['aaa', 'Bbb', 'cc'] 
            }, 
      'female': 
            {'given_names': 
                          ['ppp', 'ddd', 'sss'] 
            } 
      }

}

Now let's say I want to get 60% US names, 40% UK names, but with 50 50 % males and females names.

How Can I do it?

Current approach? Trying to think something similar to this
But I guess it is more complex then that.

I was thinking to get all the names first, then applying a distribution from them? But it is not making some logical sense. Can someone help?

        # all_possible_names = [
        #     name
        #     for list_of_names in [
        #         self.library[area][gender][
        #             "given_names"
        #         ]
        #         for gender in self.genders
        #         for area in self.name_areas
        #     ]
        #     for name in list_of_names
        # ]
        # print(all_possible_names) `

Thanks.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

孤独难免 2025-02-13 08:01:12

使用andy.Choices带重量，选择在男性/女性之间分开，假设您的字典被命名为d和n 是您想要的名称总量，然后：

from random import choice, choices

N = 3

names = [
    choice(d[country][choice(['male', 'female'])]['given_names'])
    for country in choices(['us', 'uk'], weights=[0.6, 0.4])
    for _ in range(N)
]

Use random.choices with a weight and choice to split between male/female, assuming your dictionary is named d and N is the total amount of names you'd like, then:

from random import choice, choices

N = 3

names = [
    choice(d[country][choice(['male', 'female'])]['given_names'])
    for country in choices(['us', 'uk'], weights=[0.6, 0.4])
    for _ in range(N)
]

回复收藏 0 原文

╰◇生如夏花灿烂 2025-02-13 08:01:12

您可以使用numpy的随机。选择进行重量分布，

from numpy.random import choice as npchoice
from random import choice


some_dict = {
    "us": {
        "male": {"given_names": ["Alex", "Bob", "Charlie"]},
        "female": {"given_names": ["Alice", "Betty", "Claire"]},
    },
    "uk": {
        "male": {"given_names": ["aaa", "Bbb", "cc"]},
        "female": {"given_names": ["ppp", "ddd", "sss"]},
    },
}


possible_choices = ["us", "uk"]
probability_distribution = [0.6, 0.4]
number_of_items_to_pick = 200
countries = list(
    npchoice(possible_choices, number_of_items_to_pick, p=probability_distribution)
)
print(countries)


names = []
females = 0
males = 0
for country in countries:
    gender = choice(["male", "female"])
    if gender == "female":
        females += 1
    else:
        males += 1
    name = choice(some_dict[country][gender]["given_names"])
    names.append(name)
    print(f"{country} | {gender:.1} | {name}")


print(f"\nF: {females}  | M: {males}")
print(f"US: {countries.count('us')} | UK: {countries.count('uk')}")

我在上面添加了一些逻辑并检查分布。
它可以缩短为以下逻辑：

from numpy.random import choice as npchoice
from random import choice

names = [
    choice(some_dict[country][choice(["male", "female"])]["given_names"])
    for country in npchoice(["us", "uk"], 200, p=[0.6, 0.4])
]

You can use numpy's random.choice to do the weight distribution

from numpy.random import choice as npchoice
from random import choice


some_dict = {
    "us": {
        "male": {"given_names": ["Alex", "Bob", "Charlie"]},
        "female": {"given_names": ["Alice", "Betty", "Claire"]},
    },
    "uk": {
        "male": {"given_names": ["aaa", "Bbb", "cc"]},
        "female": {"given_names": ["ppp", "ddd", "sss"]},
    },
}


possible_choices = ["us", "uk"]
probability_distribution = [0.6, 0.4]
number_of_items_to_pick = 200
countries = list(
    npchoice(possible_choices, number_of_items_to_pick, p=probability_distribution)
)
print(countries)


names = []
females = 0
males = 0
for country in countries:
    gender = choice(["male", "female"])
    if gender == "female":
        females += 1
    else:
        males += 1
    name = choice(some_dict[country][gender]["given_names"])
    names.append(name)
    print(f"{country} | {gender:.1} | {name}")


print(f"\nF: {females}  | M: {males}")
print(f"US: {countries.count('us')} | UK: {countries.count('uk')}")

I added some logic above for my testing, and to check the distribution.
It can be shortened to the logic below:

from numpy.random import choice as npchoice
from random import choice

names = [
    choice(some_dict[country][choice(["male", "female"])]["given_names"])
    for country in npchoice(["us", "uk"], 200, p=[0.6, 0.4])
]

回复收藏 0 原文

~没有更多了~