当前位置：文江博客话题详情

在python中搜索文档中的关键字

发布于 2024-11-17 23:29:46 字数 87 浏览 1 评论 0原文

我正在尝试编写一个Python脚本，以便它可以在文档中搜索关键字，并检索关键字所在的整个句子。根据我的研究，我发现 acora 可以使用，但我仍然发现它不成功。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

人间☆小暴躁 2024-11-24 23:29:46

>>> text = """Hello, this is the first sentence. This is the second. 
And this may or may not be the third. Am I right? No? lol..."""

>>> import re
>>> s = re.split(r'[.?!:]+', text)
>>> def search(word, sentences):
       return [i for i in sentences if re.search(r'\b%s\b' % word, i)]

>>> search('is', s)
['Hello, this is the first sentence', ' This is the second']

>>> text = """Hello, this is the first sentence. This is the second. 
And this may or may not be the third. Am I right? No? lol..."""

>>> import re
>>> s = re.split(r'[.?!:]+', text)
>>> def search(word, sentences):
       return [i for i in sentences if re.search(r'\b%s\b' % word, i)]

>>> search('is', s)
['Hello, this is the first sentence', ' This is the second']

回复收藏 0 原文

伴随着你 2024-11-24 23:29:46

这就是你可以在 shell 中简单地完成它的方法。你应该自己把它写成脚本。

>>> text = '''this is sentence 1. and that is sentence
              2. and sometimes sentences are good.
              when that's sentence 4, there's a good reason. and that's 
              sentence 5.'''
>>> for line in text.split('.'):
...     if 'and' in line:
...         print line
... 
 and that is sentence 2
 and sometimes sentences are good
 and that's sentence 5

在这里，我用 .split('.') 分割 text 并迭代，然后用单词 and 控制，如果包含，则打印它。

您还应该考虑到这是区分大小写的。您应该在解决方案中考虑很多事情，例如以 ! 和 ? 结尾的东西也是句子（但有时不是）

这是一个句子（哈？）还是你认为（！）如此？

将被拆分为

这是一个句子 (ha
) 还是你认为 (
) 所以

That's how you can simply do it in shell. You should write it in script yourself.

>>> text = '''this is sentence 1. and that is sentence
              2. and sometimes sentences are good.
              when that's sentence 4, there's a good reason. and that's 
              sentence 5.'''
>>> for line in text.split('.'):
...     if 'and' in line:
...         print line
... 
 and that is sentence 2
 and sometimes sentences are good
 and that's sentence 5

Here I splitted text with .split('.') and iterated, then controlled with word and and if it contains, printed it.

You should also consider that this is case-sensitive. You should consider many things on your solution, such as things ending with ! and ? are also sentences (but sometimes they aren't)

This is a sentence (ha?) or do you think (!) so?

is going to be splitted as

This is a sentence (ha
) or do you think (
) so

回复收藏 0 原文

小梨窩很甜 2024-11-24 23:29:46

我对此没有太多经验，但您可能正在寻找 nltk。

尝试这个；使用 span_tokenize 找到您的单词索引属于哪个范围，然后查找该句子。

回复收藏 0 原文

呆橘 2024-11-24 23:29:46

使用grep或egrep命令与python的子进程模块，它可能会帮助你。

例如：

from subprocess import Popen, PIPE

stdout = Popen("grep 'word1' document.txt", shell=True, stdout=PIPE).stdout
#to search 2 different words: stdout = Popen("egrep 'word1|word2' document.txt",       
#shell=True, #stdout=PIPE).stdout
data = stdout.read()
data.split('\n')

use grep or egrep commands with subprocess module of python, it may help you.

e.g:

from subprocess import Popen, PIPE

stdout = Popen("grep 'word1' document.txt", shell=True, stdout=PIPE).stdout
#to search 2 different words: stdout = Popen("egrep 'word1|word2' document.txt",       
#shell=True, #stdout=PIPE).stdout
data = stdout.read()
data.split('\n')

回复收藏 0 原文

~没有更多了~