古腾堡计划 Python 问题？

发布于 2024-09-15 11:28:22 字数 1119 浏览 8 评论 0原文

我正在尝试通过 python 的正则表达式和 NLTK 处理各种文本 - 它位于 http://www.nltk。 org/book-.我正在尝试创建一个随机文本生成器，但我遇到了一个问题。首先，这是我的算法：

输入一个句子作为输入 - 这称为触发字符串 -
获取触发字符串中最长的单词
在所有古腾堡计划数据库中搜索包含此单词的句子 - 无论大写小写 -
返回包含我在步骤 3 中谈到的单词的最长句子
将步骤 1 和步骤 4 中的句子附加在一起
重复该过程。请注意，我必须获取第二句中最长的单词并继续这样，依此类推 -

，我已经能够对前两个句子执行此操作，但我无法执行不区分大小写的搜索。古腾堡计划的整个句子数据库可通过 gutenberg.sents() 函数获得，但正则表达式 - 不区分大小写的搜索实际上是不可能的，因为 gutenberg.sents() 输出书中的句子如下 - 以列表格式的列表 -：

键入来调用莎士比亚的麦克白的所有句子

import nltk

from nltk.corpus import gutenberg 

gutenberg.sents('shakespeare-macbeth.txt')

示例：通过在 python shell 命令行中

[['[', 'The', 'Tragedie', 'of', 'Macbeth', 'by', 'William', 'Shakespeare', '1603', ']'], 
['Actus', 'Primus', '.'], .......]

，输出为： with [The Tragedie of Macbeth by William Shakespare, 1603] 和 Actus Primus。是前两句。

无论是大写还是小写，如何找到我要查找的单词？我迫切需要帮助，因为过去两天我一直在修补这个问题，它开始让我感到紧张。多谢。

原文

I am trying to process various texts by regex and NLTK of python -which is at http://www.nltk.org/book-. I am trying to create a random text generator and I am having a hard time with a problem. First, here is my algorithm:

Enter a sentence as input -this is called trigger string-
Get longest word in trigger string
Search all Project Gutenberg database for sentences that contain this word -regardless of uppercase lowercase-
Return the longest sentence that has the word I spoke about in step 3
Append the sentence in Step 1 and Step4 together
Repeat the process. Note that I have to get the longest word in second sentence and continue like that and so on-

So far I have been able to do this for first two sentences but I cannot perform a case insensitive search. Entire sentence database of Project Gutenberg is available via gutenberg.sents() function but regex - case insensitive search is practically impossible since the gutenberg.sents() outputs the sentences in books as following -in a list of list format-:

EXAMPLE: all the sentences of shakespeare's macbeth is called by typing

import nltk

from nltk.corpus import gutenberg 

gutenberg.sents('shakespeare-macbeth.txt')

into the python shell command line and output is:

[['[', 'The', 'Tragedie', 'of', 'Macbeth', 'by', 'William', 'Shakespeare', '1603', ']'], 
['Actus', 'Primus', '.'], .......]

with [The Tragedie of Macbeth by William Shakespare, 1603] and Actus Primus. being the first two sentences.

How can I find the word I'm looking for regardless of it being uppercase/lowercase ? I'm desperately in need of help since I have been tinkering with this for the past two days and it's starting to wear on my nerves. Thanks a lot.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

阳光下的泡沫是彩色的 2024-09-22 11:28:22

给定单词列表 L 和目标单词 t，

any(t.lower()==w.lower() for w in L)

以不区分大小写的方式告诉您 L 是否包含单词 t。当然，这样做速度更快，

lt = t.lower()
any(lt==w.lower() for w in L)

因为 Python 不会将常量计算“提升”到循环之外，并且除非您自己将其提升，否则它将重复执行。

给定列表 lol 的列表，可以通过以下方式找到包含 t 的最长子列表

longest = max((L for L in lol if any(lt==w.lower() for w in L)), key=len)

如果多个子列表包含 t 并且属于相同的最大长度，这会给你第一个，因为它发生了。

Given a list L of words, and a target word t,

any(t.lower()==w.lower() for w in L)

tells you whether L has word t in a case-insensitive way. It's faster, of course, to do

lt = t.lower()
any(lt==w.lower() for w in L)

since Python does not "hoist" the constant computation out of the loop and, unless you hoist it yourself, it will be performed repeatedly.

Given a list of lists lol, the longest sub-list including t can be found by

longest = max((L for L in lol if any(lt==w.lower() for w in L)), key=len)

If multiple sub-lists include t and are of the same maximal length, this will give you the first one, as it happens.

回复收藏 0 原文

缱倦旧时光 2024-09-22 11:28:22

如何使用内置函数： str.lower()¶
返回转换为小写的字符串的副本。

然后只需比较字符串即可。

回复收藏 0 原文

~没有更多了~

关于作者

雨夜星沙

暂无简介

0 文章

0 评论

25 人气

关注发私信

友情链接

文江博客

古腾堡计划 Python 问题？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

苍风燃霜

我的黑色迷你裙

悸初

撧情箌佬

森罗

lyn1245

友情链接

古腾堡计划 Python 问题？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

苍风燃霜

我的黑色迷你裙

悸初

撧情箌佬

森罗

lyn1245

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。