如何从单词词典中生成给定长度的随机文本行(装箱问题)?

发布于 2024-08-23 12:26:57 字数 2291 浏览 12 评论 0原文

我需要生成三行文本(本质上是胡言乱语),每行长度为 60 个字符,包括每行末尾的硬回车。这些行是根据不同长度(通常为 1-8 个字符)的单词词典生成的。任何单词不得使用多次,并且单词之间必须用空格分隔。我认为这本质上是一个装箱问题。

到目前为止我采取的方法是创建单词的 hashMap,并按其长度分组。然后,我选择一个随机长度,从地图中提取该长度的单词,并将其附加到我当前生成的行的末尾,考虑空格或硬回车。它大约有一半的时间有效,但另一半的时间我陷入无限循环并且我的程序崩溃。

我遇到的一个问题是:当我向行中添加随机单词时,给定长度的单词组可能会耗尽。这是因为字典中每个长度的单词数量不一定相同,例如,可能只有一个长度为1的单词。所以,我可能需要一个给定长度的单词,但不再有任何该长度的单词都可用。

以下是我迄今为止所掌握的内容的摘要。我正在使用 ActionScript,但希望能以任何语言深入了解这个问题。非常感谢。

dictionary // map of words with word lengths as keys and arrays of corresponding words as values
lengths // array of word lengths, sorted numerically
min = lengths[0] // minimum word length
max = lengths[lengths.length - 1] // maximum word length
line = ""
while ( line.length < 60 ) {
    len = lengths[round( rand() * ( lengths.length - 1 ) )]
    if ( dictionary[len] != null && dictionary[len].length > 0 ) {
        diff = 60 - line.length // number of characters needed to complete the line

        if ( line.length + len + 1 == 60 ) {
            // this word will complete the line exactly
            line += dictionary[len].splice(0, 1) + "\n"
        }
        else if ( min + max + 2 >= diff ) {
            // find the two word lengths that will complete the line
            // ==> this is where I'm having trouble
        }
        else if ( line.length + len + 1 < 60 - max ) {
            // this word will fit safely, so just add it
            line += dictionary[len].splice(0, 1) + " "
        }

        if ( dictionary[len].length == 0 ) {
            // delete any empty arrays and update min and max lengths accordingly
            dictionary[len] = null
            delete dictionary[len]

            i = lengths.indexOf( len )
            if ( i >= 0 ) {
                // words of this length have been depleted, so
                // update lengths array to ensure that next random
                // length is valid
                lengths.splice( i, 1 )
            }
            if ( lengths.indexOf( min ) == -1 ) {
                // update the min
                min = lengths[0]
            }
            if ( lengths.indexOf( max ) == -1 ) {
                // update the max
                max = lengths[lengths.length - 1]
            }
        }
    }
}

I need to generate three lines of text (essentially jibberish) that are each 60 characters long, including a hard return at the end of each line. The lines are generated from a dictionary of words of various lengths (typically 1-8 characters). No word may be used more than once, and words must be separated by spaces. I think this is essentially a bin-packing problem.

The approach I've taken so far is to create a hashMap of the words, grouped by their lengths. I then choose a random length, pull a word of that length from the map, and append it to the end of the line I'm currently generating, accounting for spaces or a hard return. It works about half the time, but the other half of the time I'm getting stuck in an infinite loop and my program crashes.

One problem I'm running into is this: as I add random words to the lines, groups of words of a given length may become depleted. This is because there are not necessarily the same number of words of each length in the dictionary, e.g., there may only be one word with a length of 1. So, I might need a word of a given length, but there are no longer any words of that length available.

Below is a summary of what I have so far. I'm working in ActionScript, but would appreciate insight into this problem in any language. Many thanks in advance.

dictionary // map of words with word lengths as keys and arrays of corresponding words as values
lengths // array of word lengths, sorted numerically
min = lengths[0] // minimum word length
max = lengths[lengths.length - 1] // maximum word length
line = ""
while ( line.length < 60 ) {
    len = lengths[round( rand() * ( lengths.length - 1 ) )]
    if ( dictionary[len] != null && dictionary[len].length > 0 ) {
        diff = 60 - line.length // number of characters needed to complete the line

        if ( line.length + len + 1 == 60 ) {
            // this word will complete the line exactly
            line += dictionary[len].splice(0, 1) + "\n"
        }
        else if ( min + max + 2 >= diff ) {
            // find the two word lengths that will complete the line
            // ==> this is where I'm having trouble
        }
        else if ( line.length + len + 1 < 60 - max ) {
            // this word will fit safely, so just add it
            line += dictionary[len].splice(0, 1) + " "
        }

        if ( dictionary[len].length == 0 ) {
            // delete any empty arrays and update min and max lengths accordingly
            dictionary[len] = null
            delete dictionary[len]

            i = lengths.indexOf( len )
            if ( i >= 0 ) {
                // words of this length have been depleted, so
                // update lengths array to ensure that next random
                // length is valid
                lengths.splice( i, 1 )
            }
            if ( lengths.indexOf( min ) == -1 ) {
                // update the min
                min = lengths[0]
            }
            if ( lengths.indexOf( max ) == -1 ) {
                // update the max
                max = lengths[lengths.length - 1]
            }
        }
    }
}

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

夜司空 2024-08-30 12:26:57
  1. 您应该将 n 个字母的单词视为 n+1 个字母,因为每个单词后面要么有空格,要么有回车。
  2. 由于所有单词的长度至少为 2 个字符,因此您永远不想填写 59 个字符。如果达到 57 个字符,则需要选择 2 个字母加回车的内容。如果达到 58,则需要 1 个字母的单词加上回车。
  3. 您是否正在尝试优化时间?同一个词可以出现多次吗?一行中多次?如果您的单词分布不均匀,例如很多行包含“a”或“I”,因为这些是英语中唯一的单字母单词,这有什么关系吗?

这是基本想法。对于每一行,开始选择单词长度,并跟踪到目前为止的单词长度和总字符数。当您接近行尾时,选择的单词长度小于剩余的字符数。 (例如,如果还剩 5 个字符,请选择 2-5 个字符范围内的单词,计算空格。)如果达到 57 个字符,请选择一个 3 个字母的单词(计算回车)。如果达到 58 个字符,请选择一个 2 个字母的单词(计算返回值)。

如果需要,您可以在此时打乱单词长度,这样所有行都不会以短单词结尾。然后对于每个单词长度,选择一个该长度的单词并将其插入。


  1. You should think of an n-letter word as being n+1 letters, because each word has either a space or return after it.

  2. Since all your words are at least 2 characters long, you don't ever want to get to a point where you have 59 characters filled in. If you get to 57, you need to pick something that is 2 letters plus the return. If you get to 58, you need a 1-letter word plus the return.

  3. Are you trying to optimize for time? Can you have the same word multiple times? Multiple times in one line? Does it matter if your words are not uniformly distributed, e.g. a lot of lines contain "a" or "I" because those are the only one-letter words in English?

Here's the basic idea. For each line, start choosing word lengths, and keep track of the word lengths and total character count so far. As you get toward the end of the line, choose word lengths less than the number of characters you have left. (e.g. if you have 5 characters left, choose words in the range of 2-5 characters, counting the space.) If you get to 57 characters, pick a 3-letter word (counting return). If you get to 58 characters, pick a 2-letter word (counting return.)

If you want, you can shuffle the word lengths at this point, so all your lines don't end with short words. Then for each word length, pick a word of that length and plug it in.

半﹌身腐败 2024-08-30 12:26:57
dictionnary = Group your words by lengths (like you already do)
total_length = 0
phrase = ""

while (total_length < 60){

random_length = generate_random_number(1,8)

if (total_length + random_length > 60)
{
  random_length = 60 - total_length // possibly - 1 if you cound \n and -2 if you 
                                    // append a blank anyway at the end
}

phrase += dictionnary.get_random_word_of_length(random_length) + " "
total_length += random_length + 1 

}
dictionnary = Group your words by lengths (like you already do)
total_length = 0
phrase = ""

while (total_length < 60){

random_length = generate_random_number(1,8)

if (total_length + random_length > 60)
{
  random_length = 60 - total_length // possibly - 1 if you cound \n and -2 if you 
                                    // append a blank anyway at the end
}

phrase += dictionnary.get_random_word_of_length(random_length) + " "
total_length += random_length + 1 

}
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文