当前位置：文江博客话题详情

根据使用频率随机生成字母？

发布于 2024-08-19 13:15:12 字数 126 浏览 9 评论 0原文

如何根据常用语音中的使用频率随机生成字母？

任何伪代码都值得赞赏，但如果用 Java 实现就更棒了。否则，只需朝正确的方向戳一下就会有所帮助。

注意：我不需要生成使用频率 - 我确信我可以很容易地查找到它。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

被翻牌 2024-08-26 13:15:12

我假设您将频率存储为 0 到 1 之间的浮点数，总计为 1。

首先，您应该准备一个累积频率表，即该字母及其之前所有字母的频率之和。

为了简化，如果您从这个频率分布开始：

A  0.1
B  0.3
C  0.4
D  0.2

您的累积频率表将是：

A  0.1
B  0.4 (= 0.1 + 0.3)
C  0.8 (= 0.1 + 0.3 + 0.4)
D  1.0 (= 0.1 + 0.3 + 0.4 + 0.2)

现在生成一个 0 到 1 之间的随机数，并查看该数字在此列表中的位置。选择累积频率最小且大于随机数的字母。一些例子：

假设您随机选择 0.612。它位于 0.4 和 0.8 之间，即 B 和 C 之间，所以你会选择 C。

如果你的随机数是 0.039，它在 0.1 之前，即在 A 之前，所以选择 A。

我希望这是有道理的，否则请随意要求澄清！

I am assuming that you store the frequencies as floating point numbers between 0 and 1 that total to make 1.

First you should prepare a table of cumulative frequencies, i.e. the sum of the frequency of that letter and all letters before it.

To simplify, if you start with this frequency distribution:

A  0.1
B  0.3
C  0.4
D  0.2

Your cumulative frequency table would be:

A  0.1
B  0.4 (= 0.1 + 0.3)
C  0.8 (= 0.1 + 0.3 + 0.4)
D  1.0 (= 0.1 + 0.3 + 0.4 + 0.2)

Now generate a random number between 0 and 1 and see where in this list that number lies. Choose the letter that has the smallest cumulative frequency larger than your random number. Some examples:

Say you randomly pick 0.612. This lies between 0.4 and 0.8, i.e. between B and C, so you'd choose C.

If your random number was 0.039, that comes before 0.1, i.e. before A, so choose A.

I hope that makes sense, otherwise feel free to ask for clarifications!

回复收藏 0 原文

执手闯天涯 2024-08-26 13:15:12

一种快速的方法是生成一个字母列表，其中每个字母根据其频率出现在列表中。假设，如果“e”的使用率为 25.6%，并且您的列表长度为 1000，则它将有 256 个“e”。

然后，您可以使用 (int) (Math.random() * 1000) 从列表中随机选择点，生成 0 到 999 之间的随机数。

回复收藏 0 原文

￡烟消云散 2024-08-26 13:15:12

我要做的是将相对频率缩放为浮点数，使它们的总和为 1.0。然后，我将创建一个包含每个字母的累积总数的数组，即必须位于顶部才能获得该字母及其“下方”的所有数字。假设A的频率为10%，b为2%，z为1%；那么你的表将如下所示：

0.000 A ; from 0% to 10% gets you an A
0.100 B ; above 10% is at least a B
0.120 C ; 12% for C...
...
0.990 Z ; if your number is >= 99% then you get a Z

然后你自己生成一个 0.0 到 1.0 之间的随机数，并在数组中进行二分搜索，查找第一个小于随机数的数字。然后选择该位置的字母。完毕。

What I would do is scale the relative frequencies as floating point numbers such that their sum is 1.0. Then I would create an array of the cumulative totals per letter, i.e. the number that must be topped to get that letter and all those "below" it. Say the frequency of A is 10%, b is 2% and z is 1%; then your table would look something like this:

0.000 A ; from 0% to 10% gets you an A
0.100 B ; above 10% is at least a B
0.120 C ; 12% for C...
...
0.990 Z ; if your number is >= 99% then you get a Z

Then you generate yourself a random number between 0.0 and 1.0 and do a binary search in the array for the first number smaller than your random number. Then pick the letter at that position. Done.

回复收藏 0 原文

不知在何时 2024-08-26 13:15:12

甚至不是伪代码，但可能的方法如下：

令 p1, p2, ..., pk 为您想要匹配的频率。

计算累积频率： p1, p1+p2, p1+p2+p3, ... , 1
生成随机均匀 (0,1) 数 x
检查累积频率 x 属于哪个区间：如果介于之间，例如、 p1+..+pi 和 p1+...+pi+p(i+1)，然后输出第 (i+1) 个字母

根据您如何实现区间查找，如果 p1 ,p2,... 按降序排序，因为您通常会更快找到包含 x 的区间。

回复收藏 0 原文

晨与橙与城 2024-08-26 13:15:12

使用二叉树为您提供了一种很好、干净的方法来找到正确的条目。在这里，您从频率图开始，其中键是符号（英文字母），值是它们出现的频率。这会被反转，并创建一个 NavigableMap，其中键是累积概率，值是符号。这使得查找变得容易。

  private final Random generator = new Random();

  private final NavigableMap<Float, Integer> table = 
    new TreeMap<Float, Integer>();

  private final float max;

  public Frequency(Map<Integer, Float> frequency)
  {
    float total = 0;
    for (Map.Entry<Integer, Float> e : frequency.entrySet()) {
      total += e.getValue();
      table.put(total, e.getKey());
    }
    max = total;
  }

  /** 
   * Choose a random symbol. The choices are weighted by frequency.
   */ 
  public int roll()
  {
    Float key = generator.nextFloat() * max;
    return table.higherEntry(key).getValue();
  }

Using a binary tree gives you a nice, clean way to find the right entry. Here, you start with a frequency map, where the keys are the symbols (English letters), and the values are the frequency of their occurrence. This gets inverted, and a NavigableMap is created where the keys are cumulative probability, and the values are symbols. That makes the lookup easy.

  private final Random generator = new Random();

  private final NavigableMap<Float, Integer> table = 
    new TreeMap<Float, Integer>();

  private final float max;

  public Frequency(Map<Integer, Float> frequency)
  {
    float total = 0;
    for (Map.Entry<Integer, Float> e : frequency.entrySet()) {
      total += e.getValue();
      table.put(total, e.getKey());
    }
    max = total;
  }

  /** 
   * Choose a random symbol. The choices are weighted by frequency.
   */ 
  public int roll()
  {
    Float key = generator.nextFloat() * max;
    return table.higherEntry(key).getValue();
  }

回复收藏 0 原文

~没有更多了~