Python 代码流程没有按预期工作?

发布于 2024-09-16 05:06:42 字数 2253 浏览 5 评论 0原文

我正在尝试通过 python 的正则表达式和 NLTK 处理各种文本 - 它位于 http://www.nltk。 org/book-.我正在尝试创建一个随机文本生成器,但遇到了一个小问题。首先,这是我的代码流程:

  1. 输入一个句子作为输入 - 这称为触发字符串,分配给一个变量 -

  2. 获取触发字符串中最长的单词

  3. 在所有 Project Gutenberg 数据库中搜索包含该单词的句子 - 无论大小写 -

  4. 返回包含我在步骤 3 中谈到的单词的最长句子

  5. 附加将步骤 1 和步骤 4 中的句子放在一起

  6. ​​

    将步骤 4 中的句子指定为新的“触发”句子并重复该过程。请注意,我必须获取第二句中最长的单词并继续这样,依此类推 -

到目前为止,我只能这样做一次。当我试图继续下去时,程序只继续打印我的搜索结果的第一句话。它实际上应该寻找这个新句子中最长的单词,并继续应用上面描述的代码流程。

下面是我的代码以及示例输入/输出:

示例输入

“代码之王”

示例输出

“代号挪威的领主本人,数量惊人,在最不忠诚的特雷托,考多领主的协助下,开始了一场小小的冲突,直到贝罗娜的新郎,在证明中,与他进行自我比较,点对抗波因特,叛逆的阿尔梅'赢得了阿尔梅,抑制了他的放荡精神:最后,胜利落在了vs“

现在,这实际上应该采用以'挪威自己......'开头的句子,并寻找其中最长的单词,然后执行上面的步骤等等,但事实并非如此。有什么建议吗?谢谢。

import nltk

from nltk.corpus import gutenberg

triggerSentence = raw_input("Please enter the trigger sentence: ")#get input str

split_str = triggerSentence.split()#split the sentence into words

longestLength = 0

longestString = ""

montyPython = 1

while montyPython:

    #code to find the longest word in the trigger sentence input
    for piece in split_str:
        if len(piece) > longestLength:
            longestString = piece
            longestLength = len(piece)


    listOfSents = gutenberg.sents() #all sentences of gutenberg are assigned -list of list format-

    listOfWords = gutenberg.words()# all words in gutenberg books -list format-
    # I tip my hat to Mr.Alex Martelli for this part, which helps me find the longest sentence
    lt = longestString.lower() #this line tells you whether word list has the longest word in a case-insensitive way. 

    longestSentence = max((listOfWords for listOfWords in listOfSents if any(lt == word.lower() for word in listOfWords)), key = len)
    #get longest sentence -list format with every word of sentence being an actual element-

    longestSent=[longestSentence]

    for word in longestSent:#convert the list longestSentence to an actual string
        sstr = " ".join(word)
    print triggerSentence + " "+ sstr
    triggerSentence = sstr

I am trying to process various texts by regex and NLTK of python -which is at http://www.nltk.org/book-. I am trying to create a random text generator and I am having a slight problem. Firstly, here is my code flow:

  1. Enter a sentence as input -this is called trigger string, is assigned to a variable-

  2. Get longest word in trigger string

  3. Search all Project Gutenberg database for sentences that contain this word -regardless of uppercase lowercase-

  4. Return the longest sentence that has the word I spoke about in step 3

  5. Append the sentence in Step 1 and Step4 together

  6. Assign the sentence in Step 4 as the new 'trigger' sentence and repeat the process. Note that I have to get the longest word in second sentence and continue like that and so on-

So far, I have been able to do this only once. When I try to keep this to continue, the program only keeps printing the first sentence my search yields. It should actually look for the longest word in this new sentence and keep applying my code flow described above.

Below is my code along with a sample input/output :

Sample input

"Thane of code"

Sample output

"Thane of code Norway himselfe , with terrible numbers , Assisted by that most disloyall Traytor , The Thane of Cawdor , began a dismall Conflict , Till that Bellona ' s Bridegroome , lapt in proofe , Confronted him with selfe - comparisons , Point against Point , rebellious Arme ' gainst Arme , Curbing his lauish spirit : and to conclude , The Victorie fell on vs"

Now this should actually take the sentence that starts with 'Norway himselfe....' and look for the longest word in it and do the steps above and so on but it doesn't. Any suggestions? Thanks.

import nltk

from nltk.corpus import gutenberg

triggerSentence = raw_input("Please enter the trigger sentence: ")#get input str

split_str = triggerSentence.split()#split the sentence into words

longestLength = 0

longestString = ""

montyPython = 1

while montyPython:

    #code to find the longest word in the trigger sentence input
    for piece in split_str:
        if len(piece) > longestLength:
            longestString = piece
            longestLength = len(piece)


    listOfSents = gutenberg.sents() #all sentences of gutenberg are assigned -list of list format-

    listOfWords = gutenberg.words()# all words in gutenberg books -list format-
    # I tip my hat to Mr.Alex Martelli for this part, which helps me find the longest sentence
    lt = longestString.lower() #this line tells you whether word list has the longest word in a case-insensitive way. 

    longestSentence = max((listOfWords for listOfWords in listOfSents if any(lt == word.lower() for word in listOfWords)), key = len)
    #get longest sentence -list format with every word of sentence being an actual element-

    longestSent=[longestSentence]

    for word in longestSent:#convert the list longestSentence to an actual string
        sstr = " ".join(word)
    print triggerSentence + " "+ sstr
    triggerSentence = sstr

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

陌上青苔 2024-09-23 05:06:42

这个怎么样?

  1. 您在触发器中找到最长的单词
  2. 您在包含 1 中找到的单词的最长句子中找到最长的单词。1
  3. . 的单词是 2. 句子中最长的单词。

会发生什么?提示:答案以“无限”开头。要解决此问题,您可能会发现小写的单词集很有用。

顺便说一句,当你认为 MontyPython 变为 False 并且程序完成时?

How about this?

  1. You find longest word in trigger
  2. You find longest word in the longest sentence containing word found in 1.
  3. The word of 1. is the longest word of the sentence of 2.

What happens? Hint: answer starts with "Infinite". To correct the problem you could find set of words in lower case to be useful.

BTW when you think MontyPython becomes False and the program finish?

┈┾☆殇 2024-09-23 05:06:42

与每次搜索整个语料库相比,构建从单词到包含该单词的最长句子的单个映射可能会更快。这是我的(未经测试的)尝试。

import collections
from nltk.corpus import gutenberg

def words_in(sentence):
    """Generate all words in the sentence (lower-cased)"""
    for word in sentence.split():
        word = word.strip('.,"\'-:;')
        if word:
            yield word.lower()

def make_sentence_map(books):
    """Construct a map from words to the longest sentence containing the word."""
    result = collections.defaultdict(str)
    for book in books:
        for sentence in book:
            for word in words_in(sentence):
                if len(sentence) > len(result[word]):
                    result[word] = sent
    return result

def generate_random_text(sentence, sentence_map):
    while True:
        yield sentence
        longest_word = max(words_in(sentence), key=len)
        sentence = sentence_map[longest_word]

sentence_map = make_sentence_map(gutenberg.sents())
for sentence in generate_random_text('Thane of code.', sentence_map): 
    print sentence

Rather than searching the entire corpus each time, it may be faster to construct a single map from word to the longest sentence containing that word. Here's my (untested) attempt to do this.

import collections
from nltk.corpus import gutenberg

def words_in(sentence):
    """Generate all words in the sentence (lower-cased)"""
    for word in sentence.split():
        word = word.strip('.,"\'-:;')
        if word:
            yield word.lower()

def make_sentence_map(books):
    """Construct a map from words to the longest sentence containing the word."""
    result = collections.defaultdict(str)
    for book in books:
        for sentence in book:
            for word in words_in(sentence):
                if len(sentence) > len(result[word]):
                    result[word] = sent
    return result

def generate_random_text(sentence, sentence_map):
    while True:
        yield sentence
        longest_word = max(words_in(sentence), key=len)
        sentence = sentence_map[longest_word]

sentence_map = make_sentence_map(gutenberg.sents())
for sentence in generate_random_text('Thane of code.', sentence_map): 
    print sentence
秋日私语 2024-09-23 05:06:42

汉金先生的答案更优雅,但以下内容更符合您开始的方法:

import sys
import string
import nltk
from nltk.corpus import gutenberg

def longest_element(p):
    """return the first element of p which has the greatest len()"""
    max_len = 0
    elem = None
    for e in p:
        if len(e) > max_len:
            elem = e
            max_len = len(e)
    return elem

def downcase(p):
    """returns a list of words in p shifted to lower case"""
    return map(string.lower, p)


def unique_words():
    """it turns out unique_words was never referenced so this is here
       for pedagogy"""
    # there are 2.6 million words in the gutenburg corpus but only ~42k unique
    # ignoring case, let's pare that down a bit
    for word in gutenberg.words():
        words.add(word.lower())
    print 'gutenberg.words() has', len(words), 'unique caseless words'
    return words

print 'loading gutenburg corpus...'
sentences = []
for sentence in gutenberg.sents():
    sentences.append(downcase(sentence))

trigger = sys.argv[1:]
target = longest_element(trigger).lower()
last_target = None

while target != last_target:
    matched_sentences = []
    for sentence in sentences:
        if target in sentence:
            matched_sentences.append(sentence)

    print '===', target, 'matched', len(matched_sentences), 'sentences'
    longestSentence = longest_element(matched_sentences)
    print ' '.join(longestSentence)

    trigger = longestSentence
    last_target = target
    target = longest_element(trigger).lower()

不过,鉴于您的例句,它在两个周期内达到固定:

$ python nltkgut.py 代码的 Thane
正在加载古腾堡语料库...
===目标领主匹配了24个句子
挪威本人,有着可怕的
数字,由最多的人协助
背叛塔雷托领主
考多,开始了一场小小的冲突,
直到贝罗娜的新郎,
圈在普鲁特,面对他
自我比较,针锋相对
点,叛逆的阿尔梅'盖斯特阿尔梅
,抑制他的傲慢精神:并
结论是,胜利落在了 vs
===目标新郎匹配1句
挪威本人,与
可怕的数字,在此帮助下
最不忠诚的塔雷托领主
考多,开始了一场小小的冲突,
直到贝罗娜的新郎,
圈在普鲁特,面对他
自我比较,针锋相对
点,叛逆的阿尔梅'盖斯特阿尔梅
,抑制他的傲慢精神:并
结论是,胜利落在了vs身上

一个问题的响应上的部分问题在于它做了你所要求的,但你问了一个比你想要答案更具体的问题。因此,响应陷入了一些相当复杂的列表表达式中,我不确定你是否理解。我建议您更自由地使用 print 语句,并且如果您不知道它的作用,则不要导入代码。在展开列表表达式时,我发现(如前所述)您从未使用过语料库单词列表。函数也是一个帮助。

Mr. Hankin's answer is more elegant, but the following is more in keeping with the approach you began with:

import sys
import string
import nltk
from nltk.corpus import gutenberg

def longest_element(p):
    """return the first element of p which has the greatest len()"""
    max_len = 0
    elem = None
    for e in p:
        if len(e) > max_len:
            elem = e
            max_len = len(e)
    return elem

def downcase(p):
    """returns a list of words in p shifted to lower case"""
    return map(string.lower, p)


def unique_words():
    """it turns out unique_words was never referenced so this is here
       for pedagogy"""
    # there are 2.6 million words in the gutenburg corpus but only ~42k unique
    # ignoring case, let's pare that down a bit
    for word in gutenberg.words():
        words.add(word.lower())
    print 'gutenberg.words() has', len(words), 'unique caseless words'
    return words

print 'loading gutenburg corpus...'
sentences = []
for sentence in gutenberg.sents():
    sentences.append(downcase(sentence))

trigger = sys.argv[1:]
target = longest_element(trigger).lower()
last_target = None

while target != last_target:
    matched_sentences = []
    for sentence in sentences:
        if target in sentence:
            matched_sentences.append(sentence)

    print '===', target, 'matched', len(matched_sentences), 'sentences'
    longestSentence = longest_element(matched_sentences)
    print ' '.join(longestSentence)

    trigger = longestSentence
    last_target = target
    target = longest_element(trigger).lower()

Given your sample sentence though, it reaches fixation in two cycles:

$ python nltkgut.py Thane of code
loading gutenburg corpus...
=== target thane matched 24 sentences
norway himselfe , with terrible
numbers , assisted by that most
disloyall traytor , the thane of
cawdor , began a dismall conflict ,
till that bellona ' s bridegroome ,
lapt in proofe , confronted him with
selfe - comparisons , point against
point , rebellious arme ' gainst arme
, curbing his lauish spirit : and to
conclude , the victorie fell on vs
=== target bridegroome matched 1 sentences
norway himselfe , with
terrible numbers , assisted by that
most disloyall traytor , the thane of
cawdor , began a dismall conflict ,
till that bellona ' s bridegroome ,
lapt in proofe , confronted him with
selfe - comparisons , point against
point , rebellious arme ' gainst arme
, curbing his lauish spirit : and to
conclude , the victorie fell on vs

Part of the trouble with the response to the last problem is that it did what you asked, but you asked a more specific question than you wanted an answer to. Thus the response got bogged down in some rather complicated list expressions that I'm not sure you understood. I suggest that you make more liberal use of print statements and don't import code if you don't know what it does. While unwrapping the list expressions I found (as noted) that you never used the corpus wordlist. Functions are a help also.

得不到的就毁灭 2024-09-23 05:06:42

您在循环外部分配“split_str”,因此它会获取原始值并保留它。您需要在 while 循环开始时分配它,因此它每次都会改变。

import nltk

from nltk.corpus import gutenberg

triggerSentence = raw_input("Please enter the trigger sentence: ")#get input str

longestLength = 0

longestString = ""

montyPython = 1

while montyPython:
    #so this is run every time through the loop
    split_str = triggerSentence.split()#split the sentence into words

    #code to find the longest word in the trigger sentence input
    for piece in split_str:
        if len(piece) > longestLength:
            longestString = piece
            longestLength = len(piece)


    listOfSents = gutenberg.sents() #all sentences of gutenberg are assigned -list of list format-

    listOfWords = gutenberg.words()# all words in gutenberg books -list format-
    # I tip my hat to Mr.Alex Martelli for this part, which helps me find the longest sentence
    lt = longestString.lower() #this line tells you whether word list has the longest word in a case-insensitive way. 

    longestSentence = max((listOfWords for listOfWords in listOfSents if any(lt == word.lower() for word in listOfWords)), key = len)
    #get longest sentence -list format with every word of sentence being an actual element-

    longestSent=[longestSentence]

    for word in longestSent:#convert the list longestSentence to an actual string
        sstr = " ".join(word)
    print triggerSentence + " "+ sstr
    triggerSentence = sstr

You are assigning "split_str" outside of the loop, so it gets the original value and then keeps it. You need to assign it at the beginning of the while loop, so it changes each time.

import nltk

from nltk.corpus import gutenberg

triggerSentence = raw_input("Please enter the trigger sentence: ")#get input str

longestLength = 0

longestString = ""

montyPython = 1

while montyPython:
    #so this is run every time through the loop
    split_str = triggerSentence.split()#split the sentence into words

    #code to find the longest word in the trigger sentence input
    for piece in split_str:
        if len(piece) > longestLength:
            longestString = piece
            longestLength = len(piece)


    listOfSents = gutenberg.sents() #all sentences of gutenberg are assigned -list of list format-

    listOfWords = gutenberg.words()# all words in gutenberg books -list format-
    # I tip my hat to Mr.Alex Martelli for this part, which helps me find the longest sentence
    lt = longestString.lower() #this line tells you whether word list has the longest word in a case-insensitive way. 

    longestSentence = max((listOfWords for listOfWords in listOfSents if any(lt == word.lower() for word in listOfWords)), key = len)
    #get longest sentence -list format with every word of sentence being an actual element-

    longestSent=[longestSentence]

    for word in longestSent:#convert the list longestSentence to an actual string
        sstr = " ".join(word)
    print triggerSentence + " "+ sstr
    triggerSentence = sstr
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文