根据使用频率随机生成字母?
如何根据常用语音中的使用频率随机生成字母?
任何伪代码都值得赞赏,但如果用 Java 实现就更棒了。否则,只需朝正确的方向戳一下就会有所帮助。
注意:我不需要生成使用频率 - 我确信我可以很容易地查找到它。
How can I randomly generate letters according to their frequency of use in common speech?
Any pseudo-code appreciated, but an implementation in Java would be fantastic. Otherwise just a poke in the right direction would be helpful.
Note: I don't need to generate the frequencies of usage - I'm sure I can look that up easily enough.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
我假设您将频率存储为 0 到 1 之间的浮点数,总计为 1。
首先,您应该准备一个累积频率表,即该字母及其之前所有字母的频率之和。
为了简化,如果您从这个频率分布开始:
您的累积频率表将是:
现在生成一个 0 到 1 之间的随机数,并查看该数字在此列表中的位置。选择累积频率最小且大于随机数的字母。一些例子:
假设您随机选择 0.612。它位于 0.4 和 0.8 之间,即 B 和 C 之间,所以你会选择 C。
如果你的随机数是 0.039,它在 0.1 之前,即在 A 之前,所以选择 A。
我希望这是有道理的,否则请随意要求澄清!
I am assuming that you store the frequencies as floating point numbers between 0 and 1 that total to make 1.
First you should prepare a table of cumulative frequencies, i.e. the sum of the frequency of that letter and all letters before it.
To simplify, if you start with this frequency distribution:
Your cumulative frequency table would be:
Now generate a random number between 0 and 1 and see where in this list that number lies. Choose the letter that has the smallest cumulative frequency larger than your random number. Some examples:
Say you randomly pick 0.612. This lies between 0.4 and 0.8, i.e. between B and C, so you'd choose C.
If your random number was 0.039, that comes before 0.1, i.e. before A, so choose A.
I hope that makes sense, otherwise feel free to ask for clarifications!
一种快速的方法是生成一个字母列表,其中每个字母根据其频率出现在列表中。假设,如果“e”的使用率为 25.6%,并且您的列表长度为 1000,则它将有 256 个“e”。
然后,您可以使用
(int) (Math.random() * 1000)
从列表中随机选择点,生成 0 到 999 之间的随机数。One quick way to do it would be to generate a list of letters, where each letter appeared in the list in accordance with its frequency. Say, if "e" was used 25.6% of the time, and your list had length 1000, it would have 256 "e"s.
Then you could just randomly pick spots from the list by using
(int) (Math.random() * 1000)
to generate random numbers between 0 and 999.我要做的是将相对频率缩放为浮点数,使它们的总和为 1.0。然后,我将创建一个包含每个字母的累积总数的数组,即必须位于顶部才能获得该字母及其“下方”的所有数字。假设A的频率为10%,b为2%,z为1%;那么你的表将如下所示:
然后你自己生成一个 0.0 到 1.0 之间的随机数,并在数组中进行二分搜索,查找第一个小于随机数的数字。然后选择该位置的字母。完毕。
What I would do is scale the relative frequencies as floating point numbers such that their sum is 1.0. Then I would create an array of the cumulative totals per letter, i.e. the number that must be topped to get that letter and all those "below" it. Say the frequency of A is 10%, b is 2% and z is 1%; then your table would look something like this:
Then you generate yourself a random number between 0.0 and 1.0 and do a binary search in the array for the first number smaller than your random number. Then pick the letter at that position. Done.
甚至不是伪代码,但可能的方法如下:
令 p1, p2, ..., pk 为您想要匹配的频率。
根据您如何实现区间查找,如果 p1 ,p2,... 按降序排序,因为您通常会更快找到包含 x 的区间。
Not even a pseudo-code, but a possible approach is as follows:
Let p1, p2, ..., pk be the frequencies that you want to match.
Depending on how you implement the interval-finding, the procedure is usually more efficient if the p1,p2,... are sorted in decreasing order, because you will usually find the interval containing x sooner.
使用二叉树为您提供了一种很好、干净的方法来找到正确的条目。在这里,您从频率图开始,其中键是符号(英文字母),值是它们出现的频率。这会被反转,并创建一个
NavigableMap
,其中键是累积概率,值是符号。这使得查找变得容易。Using a binary tree gives you a nice, clean way to find the right entry. Here, you start with a
frequency
map, where the keys are the symbols (English letters), and the values are the frequency of their occurrence. This gets inverted, and aNavigableMap
is created where the keys are cumulative probability, and the values are symbols. That makes the lookup easy.