如何从单词词典中生成给定长度的随机文本行(装箱问题)?
我需要生成三行文本(本质上是胡言乱语),每行长度为 60 个字符,包括每行末尾的硬回车。这些行是根据不同长度(通常为 1-8 个字符)的单词词典生成的。任何单词不得使用多次,并且单词之间必须用空格分隔。我认为这本质上是一个装箱问题。
到目前为止我采取的方法是创建单词的 hashMap,并按其长度分组。然后,我选择一个随机长度,从地图中提取该长度的单词,并将其附加到我当前生成的行的末尾,考虑空格或硬回车。它大约有一半的时间有效,但另一半的时间我陷入无限循环并且我的程序崩溃。
我遇到的一个问题是:当我向行中添加随机单词时,给定长度的单词组可能会耗尽。这是因为字典中每个长度的单词数量不一定相同,例如,可能只有一个长度为1的单词。所以,我可能需要一个给定长度的单词,但不再有任何该长度的单词都可用。
以下是我迄今为止所掌握的内容的摘要。我正在使用 ActionScript,但希望能以任何语言深入了解这个问题。非常感谢。
dictionary // map of words with word lengths as keys and arrays of corresponding words as values
lengths // array of word lengths, sorted numerically
min = lengths[0] // minimum word length
max = lengths[lengths.length - 1] // maximum word length
line = ""
while ( line.length < 60 ) {
len = lengths[round( rand() * ( lengths.length - 1 ) )]
if ( dictionary[len] != null && dictionary[len].length > 0 ) {
diff = 60 - line.length // number of characters needed to complete the line
if ( line.length + len + 1 == 60 ) {
// this word will complete the line exactly
line += dictionary[len].splice(0, 1) + "\n"
}
else if ( min + max + 2 >= diff ) {
// find the two word lengths that will complete the line
// ==> this is where I'm having trouble
}
else if ( line.length + len + 1 < 60 - max ) {
// this word will fit safely, so just add it
line += dictionary[len].splice(0, 1) + " "
}
if ( dictionary[len].length == 0 ) {
// delete any empty arrays and update min and max lengths accordingly
dictionary[len] = null
delete dictionary[len]
i = lengths.indexOf( len )
if ( i >= 0 ) {
// words of this length have been depleted, so
// update lengths array to ensure that next random
// length is valid
lengths.splice( i, 1 )
}
if ( lengths.indexOf( min ) == -1 ) {
// update the min
min = lengths[0]
}
if ( lengths.indexOf( max ) == -1 ) {
// update the max
max = lengths[lengths.length - 1]
}
}
}
}
I need to generate three lines of text (essentially jibberish) that are each 60 characters long, including a hard return at the end of each line. The lines are generated from a dictionary of words of various lengths (typically 1-8 characters). No word may be used more than once, and words must be separated by spaces. I think this is essentially a bin-packing problem.
The approach I've taken so far is to create a hashMap of the words, grouped by their lengths. I then choose a random length, pull a word of that length from the map, and append it to the end of the line I'm currently generating, accounting for spaces or a hard return. It works about half the time, but the other half of the time I'm getting stuck in an infinite loop and my program crashes.
One problem I'm running into is this: as I add random words to the lines, groups of words of a given length may become depleted. This is because there are not necessarily the same number of words of each length in the dictionary, e.g., there may only be one word with a length of 1. So, I might need a word of a given length, but there are no longer any words of that length available.
Below is a summary of what I have so far. I'm working in ActionScript, but would appreciate insight into this problem in any language. Many thanks in advance.
dictionary // map of words with word lengths as keys and arrays of corresponding words as values
lengths // array of word lengths, sorted numerically
min = lengths[0] // minimum word length
max = lengths[lengths.length - 1] // maximum word length
line = ""
while ( line.length < 60 ) {
len = lengths[round( rand() * ( lengths.length - 1 ) )]
if ( dictionary[len] != null && dictionary[len].length > 0 ) {
diff = 60 - line.length // number of characters needed to complete the line
if ( line.length + len + 1 == 60 ) {
// this word will complete the line exactly
line += dictionary[len].splice(0, 1) + "\n"
}
else if ( min + max + 2 >= diff ) {
// find the two word lengths that will complete the line
// ==> this is where I'm having trouble
}
else if ( line.length + len + 1 < 60 - max ) {
// this word will fit safely, so just add it
line += dictionary[len].splice(0, 1) + " "
}
if ( dictionary[len].length == 0 ) {
// delete any empty arrays and update min and max lengths accordingly
dictionary[len] = null
delete dictionary[len]
i = lengths.indexOf( len )
if ( i >= 0 ) {
// words of this length have been depleted, so
// update lengths array to ensure that next random
// length is valid
lengths.splice( i, 1 )
}
if ( lengths.indexOf( min ) == -1 ) {
// update the min
min = lengths[0]
}
if ( lengths.indexOf( max ) == -1 ) {
// update the max
max = lengths[lengths.length - 1]
}
}
}
}
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
这是基本想法。对于每一行,开始选择单词长度,并跟踪到目前为止的单词长度和总字符数。当您接近行尾时,选择的单词长度小于剩余的字符数。 (例如,如果还剩 5 个字符,请选择 2-5 个字符范围内的单词,计算空格。)如果达到 57 个字符,请选择一个 3 个字母的单词(计算回车)。如果达到 58 个字符,请选择一个 2 个字母的单词(计算返回值)。
如果需要,您可以在此时打乱单词长度,这样所有行都不会以短单词结尾。然后对于每个单词长度,选择一个该长度的单词并将其插入。
Here's the basic idea. For each line, start choosing word lengths, and keep track of the word lengths and total character count so far. As you get toward the end of the line, choose word lengths less than the number of characters you have left. (e.g. if you have 5 characters left, choose words in the range of 2-5 characters, counting the space.) If you get to 57 characters, pick a 3-letter word (counting return). If you get to 58 characters, pick a 2-letter word (counting return.)
If you want, you can shuffle the word lengths at this point, so all your lines don't end with short words. Then for each word length, pick a word of that length and plug it in.