从字典概率明智的挑选

发布于 2025-02-06 08:01:12 字数 1116 浏览 5 评论 0原文

假设我

{'us': 
     {'male': 
            {'given_names': 
                          ['Alex', 'Bob', 'Charlie'] 
            }, 
      'female': 
            {'given_names': 
                          ['Alice', 'Betty', 'Claire'] 
            } 
      },

'uk': 
     {'male': 
            {'given_names': 
                          ['aaa', 'Bbb', 'cc'] 
            }, 
      'female': 
            {'given_names': 
                          ['ppp', 'ddd', 'sss'] 
            } 
      }

}

现在有一个词典,假设我想获得60%的美国名字,40%的英国名字,但有50%的男性和女性的名字。

我该怎么做?

当前的方法?试图思考类似于 但是我想这比那更复杂。

我当时想先获取所有名称,然后从中应用分发?但这并不是一定的逻辑意义。有人可以帮忙吗?

        # all_possible_names = [
        #     name
        #     for list_of_names in [
        #         self.library[area][gender][
        #             "given_names"
        #         ]
        #         for gender in self.genders
        #         for area in self.name_areas
        #     ]
        #     for name in list_of_names
        # ]
        # print(all_possible_names) `

谢谢。

Let's say I have a dictionary

{'us': 
     {'male': 
            {'given_names': 
                          ['Alex', 'Bob', 'Charlie'] 
            }, 
      'female': 
            {'given_names': 
                          ['Alice', 'Betty', 'Claire'] 
            } 
      },

'uk': 
     {'male': 
            {'given_names': 
                          ['aaa', 'Bbb', 'cc'] 
            }, 
      'female': 
            {'given_names': 
                          ['ppp', 'ddd', 'sss'] 
            } 
      }

}

Now let's say I want to get 60% US names, 40% UK names, but with 50 50 % males and females names.

How Can I do it?

Current approach? Trying to think something similar to this
But I guess it is more complex then that.

I was thinking to get all the names first, then applying a distribution from them? But it is not making some logical sense. Can someone help?

        # all_possible_names = [
        #     name
        #     for list_of_names in [
        #         self.library[area][gender][
        #             "given_names"
        #         ]
        #         for gender in self.genders
        #         for area in self.name_areas
        #     ]
        #     for name in list_of_names
        # ]
        # print(all_possible_names) `

Thanks.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

孤独难免 2025-02-13 08:01:12

使用andy.Choices带重量,选择在男性/女性之间分开,假设您的字典被命名为dn 是您想要的名称总量,然后:

from random import choice, choices

N = 3

names = [
    choice(d[country][choice(['male', 'female'])]['given_names'])
    for country in choices(['us', 'uk'], weights=[0.6, 0.4])
    for _ in range(N)
]

Use random.choices with a weight and choice to split between male/female, assuming your dictionary is named d and N is the total amount of names you'd like, then:

from random import choice, choices

N = 3

names = [
    choice(d[country][choice(['male', 'female'])]['given_names'])
    for country in choices(['us', 'uk'], weights=[0.6, 0.4])
    for _ in range(N)
]
╰◇生如夏花灿烂 2025-02-13 08:01:12

您可以使用numpy的随机。选择进行重量分布,

from numpy.random import choice as npchoice
from random import choice


some_dict = {
    "us": {
        "male": {"given_names": ["Alex", "Bob", "Charlie"]},
        "female": {"given_names": ["Alice", "Betty", "Claire"]},
    },
    "uk": {
        "male": {"given_names": ["aaa", "Bbb", "cc"]},
        "female": {"given_names": ["ppp", "ddd", "sss"]},
    },
}


possible_choices = ["us", "uk"]
probability_distribution = [0.6, 0.4]
number_of_items_to_pick = 200
countries = list(
    npchoice(possible_choices, number_of_items_to_pick, p=probability_distribution)
)
print(countries)


names = []
females = 0
males = 0
for country in countries:
    gender = choice(["male", "female"])
    if gender == "female":
        females += 1
    else:
        males += 1
    name = choice(some_dict[country][gender]["given_names"])
    names.append(name)
    print(f"{country} | {gender:.1} | {name}")


print(f"\nF: {females}  | M: {males}")
print(f"US: {countries.count('us')} | UK: {countries.count('uk')}")

我在上面添加了一些逻辑并检查分布。
它可以缩短为以下逻辑:

from numpy.random import choice as npchoice
from random import choice

names = [
    choice(some_dict[country][choice(["male", "female"])]["given_names"])
    for country in npchoice(["us", "uk"], 200, p=[0.6, 0.4])
]

You can use numpy's random.choice to do the weight distribution

from numpy.random import choice as npchoice
from random import choice


some_dict = {
    "us": {
        "male": {"given_names": ["Alex", "Bob", "Charlie"]},
        "female": {"given_names": ["Alice", "Betty", "Claire"]},
    },
    "uk": {
        "male": {"given_names": ["aaa", "Bbb", "cc"]},
        "female": {"given_names": ["ppp", "ddd", "sss"]},
    },
}


possible_choices = ["us", "uk"]
probability_distribution = [0.6, 0.4]
number_of_items_to_pick = 200
countries = list(
    npchoice(possible_choices, number_of_items_to_pick, p=probability_distribution)
)
print(countries)


names = []
females = 0
males = 0
for country in countries:
    gender = choice(["male", "female"])
    if gender == "female":
        females += 1
    else:
        males += 1
    name = choice(some_dict[country][gender]["given_names"])
    names.append(name)
    print(f"{country} | {gender:.1} | {name}")


print(f"\nF: {females}  | M: {males}")
print(f"US: {countries.count('us')} | UK: {countries.count('uk')}")

I added some logic above for my testing, and to check the distribution.
It can be shortened to the logic below:

from numpy.random import choice as npchoice
from random import choice

names = [
    choice(some_dict[country][choice(["male", "female"])]["given_names"])
    for country in npchoice(["us", "uk"], 200, p=[0.6, 0.4])
]
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文