Python 代码流程没有按预期工作?
我正在尝试通过 python 的正则表达式和 NLTK 处理各种文本 - 它位于 http://www.nltk。 org/book-.我正在尝试创建一个随机文本生成器,但遇到了一个小问题。首先,这是我的代码流程:
输入一个句子作为输入 - 这称为触发字符串,分配给一个变量 -
获取触发字符串中最长的单词
在所有 Project Gutenberg 数据库中搜索包含该单词的句子 - 无论大小写 -
返回包含我在步骤 3 中谈到的单词的最长句子
附加将步骤 1 和步骤 4 中的句子放在一起
-
将步骤 4 中的句子指定为新的“触发”句子并重复该过程。请注意,我必须获取第二句中最长的单词并继续这样,依此类推 -
到目前为止,我只能这样做一次。当我试图继续下去时,程序只继续打印我的搜索结果的第一句话。它实际上应该寻找这个新句子中最长的单词,并继续应用上面描述的代码流程。
下面是我的代码以及示例输入/输出:
示例输入
“代码之王”
示例输出
“代号挪威的领主本人,数量惊人,在最不忠诚的特雷托,考多领主的协助下,开始了一场小小的冲突,直到贝罗娜的新郎,在证明中,与他进行自我比较,点对抗波因特,叛逆的阿尔梅'赢得了阿尔梅,抑制了他的放荡精神:最后,胜利落在了vs“
现在,这实际上应该采用以'挪威自己......'开头的句子,并寻找其中最长的单词,然后执行上面的步骤等等,但事实并非如此。有什么建议吗?谢谢。
import nltk
from nltk.corpus import gutenberg
triggerSentence = raw_input("Please enter the trigger sentence: ")#get input str
split_str = triggerSentence.split()#split the sentence into words
longestLength = 0
longestString = ""
montyPython = 1
while montyPython:
#code to find the longest word in the trigger sentence input
for piece in split_str:
if len(piece) > longestLength:
longestString = piece
longestLength = len(piece)
listOfSents = gutenberg.sents() #all sentences of gutenberg are assigned -list of list format-
listOfWords = gutenberg.words()# all words in gutenberg books -list format-
# I tip my hat to Mr.Alex Martelli for this part, which helps me find the longest sentence
lt = longestString.lower() #this line tells you whether word list has the longest word in a case-insensitive way.
longestSentence = max((listOfWords for listOfWords in listOfSents if any(lt == word.lower() for word in listOfWords)), key = len)
#get longest sentence -list format with every word of sentence being an actual element-
longestSent=[longestSentence]
for word in longestSent:#convert the list longestSentence to an actual string
sstr = " ".join(word)
print triggerSentence + " "+ sstr
triggerSentence = sstr
I am trying to process various texts by regex and NLTK of python -which is at http://www.nltk.org/book-. I am trying to create a random text generator and I am having a slight problem. Firstly, here is my code flow:
Enter a sentence as input -this is called trigger string, is assigned to a variable-
Get longest word in trigger string
Search all Project Gutenberg database for sentences that contain this word -regardless of uppercase lowercase-
Return the longest sentence that has the word I spoke about in step 3
Append the sentence in Step 1 and Step4 together
Assign the sentence in Step 4 as the new 'trigger' sentence and repeat the process. Note that I have to get the longest word in second sentence and continue like that and so on-
So far, I have been able to do this only once. When I try to keep this to continue, the program only keeps printing the first sentence my search yields. It should actually look for the longest word in this new sentence and keep applying my code flow described above.
Below is my code along with a sample input/output :
Sample input
"Thane of code"
Sample output
"Thane of code Norway himselfe , with terrible numbers , Assisted by that most disloyall Traytor , The Thane of Cawdor , began a dismall Conflict , Till that Bellona ' s Bridegroome , lapt in proofe , Confronted him with selfe - comparisons , Point against Point , rebellious Arme ' gainst Arme , Curbing his lauish spirit : and to conclude , The Victorie fell on vs"
Now this should actually take the sentence that starts with 'Norway himselfe....' and look for the longest word in it and do the steps above and so on but it doesn't. Any suggestions? Thanks.
import nltk
from nltk.corpus import gutenberg
triggerSentence = raw_input("Please enter the trigger sentence: ")#get input str
split_str = triggerSentence.split()#split the sentence into words
longestLength = 0
longestString = ""
montyPython = 1
while montyPython:
#code to find the longest word in the trigger sentence input
for piece in split_str:
if len(piece) > longestLength:
longestString = piece
longestLength = len(piece)
listOfSents = gutenberg.sents() #all sentences of gutenberg are assigned -list of list format-
listOfWords = gutenberg.words()# all words in gutenberg books -list format-
# I tip my hat to Mr.Alex Martelli for this part, which helps me find the longest sentence
lt = longestString.lower() #this line tells you whether word list has the longest word in a case-insensitive way.
longestSentence = max((listOfWords for listOfWords in listOfSents if any(lt == word.lower() for word in listOfWords)), key = len)
#get longest sentence -list format with every word of sentence being an actual element-
longestSent=[longestSentence]
for word in longestSent:#convert the list longestSentence to an actual string
sstr = " ".join(word)
print triggerSentence + " "+ sstr
triggerSentence = sstr
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
这个怎么样?
会发生什么?提示:答案以“无限”开头。要解决此问题,您可能会发现小写的单词集很有用。
顺便说一句,当你认为 MontyPython 变为 False 并且程序完成时?
How about this?
What happens? Hint: answer starts with "Infinite". To correct the problem you could find set of words in lower case to be useful.
BTW when you think MontyPython becomes False and the program finish?
与每次搜索整个语料库相比,构建从单词到包含该单词的最长句子的单个映射可能会更快。这是我的(未经测试的)尝试。
Rather than searching the entire corpus each time, it may be faster to construct a single map from word to the longest sentence containing that word. Here's my (untested) attempt to do this.
汉金先生的答案更优雅,但以下内容更符合您开始的方法:
不过,鉴于您的例句,它在两个周期内达到固定:
一个问题的响应上的部分问题在于它做了你所要求的,但你问了一个比你想要答案更具体的问题。因此,响应陷入了一些相当复杂的列表表达式中,我不确定你是否理解。我建议您更自由地使用 print 语句,并且如果您不知道它的作用,则不要导入代码。在展开列表表达式时,我发现(如前所述)您从未使用过语料库单词列表。函数也是一个帮助。
Mr. Hankin's answer is more elegant, but the following is more in keeping with the approach you began with:
Given your sample sentence though, it reaches fixation in two cycles:
Part of the trouble with the response to the last problem is that it did what you asked, but you asked a more specific question than you wanted an answer to. Thus the response got bogged down in some rather complicated list expressions that I'm not sure you understood. I suggest that you make more liberal use of print statements and don't import code if you don't know what it does. While unwrapping the list expressions I found (as noted) that you never used the corpus wordlist. Functions are a help also.
您在循环外部分配“split_str”,因此它会获取原始值并保留它。您需要在 while 循环开始时分配它,因此它每次都会改变。
You are assigning "split_str" outside of the loop, so it gets the original value and then keeps it. You need to assign it at the beginning of the while loop, so it changes each time.