在python中搜索文档中的关键字

发布于 2024-11-17 23:29:46 字数 87 浏览 0 评论 0原文

我正在尝试编写一个Python脚本,以便它可以在文档中搜索关键字,并检索关键字所在的整个句子。根据我的研究,我发现 acora 可以使用,但我仍然发现它不成功。

I am trying to write a python script so that it can search for a keyword in a document, and retrieve the entire sentence where the keyword is. From my research i saw that acora can be used but i still found it unsuccessful.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

人间☆小暴躁 2024-11-24 23:29:46
>>> text = """Hello, this is the first sentence. This is the second. 
And this may or may not be the third. Am I right? No? lol..."""

>>> import re
>>> s = re.split(r'[.?!:]+', text)
>>> def search(word, sentences):
       return [i for i in sentences if re.search(r'\b%s\b' % word, i)]

>>> search('is', s)
['Hello, this is the first sentence', ' This is the second']
>>> text = """Hello, this is the first sentence. This is the second. 
And this may or may not be the third. Am I right? No? lol..."""

>>> import re
>>> s = re.split(r'[.?!:]+', text)
>>> def search(word, sentences):
       return [i for i in sentences if re.search(r'\b%s\b' % word, i)]

>>> search('is', s)
['Hello, this is the first sentence', ' This is the second']
伴随着你 2024-11-24 23:29:46

这就是你可以在 shell 中简单地完成它的方法。你应该自己把它写成脚本。

>>> text = '''this is sentence 1. and that is sentence
              2. and sometimes sentences are good.
              when that's sentence 4, there's a good reason. and that's 
              sentence 5.'''
>>> for line in text.split('.'):
...     if 'and' in line:
...         print line
... 
 and that is sentence 2
 and sometimes sentences are good
 and that's sentence 5

在这里,我用 .split('.') 分割 text 并迭代,然后用单词 and 控制,如果包含,则打印它。

您还应该考虑到这是区分大小写的。您应该在解决方案中考虑很多事情,例如以 !? 结尾的东西也是句子(但有时不是)

这是一个句子(哈?)还是你认为(!)如此?

将被拆分为

  • 这是一个句子 (ha
  • ) 还是你认为 (
  • ) 所以

That's how you can simply do it in shell. You should write it in script yourself.

>>> text = '''this is sentence 1. and that is sentence
              2. and sometimes sentences are good.
              when that's sentence 4, there's a good reason. and that's 
              sentence 5.'''
>>> for line in text.split('.'):
...     if 'and' in line:
...         print line
... 
 and that is sentence 2
 and sometimes sentences are good
 and that's sentence 5

Here I splitted text with .split('.') and iterated, then controlled with word and and if it contains, printed it.

You should also consider that this is case-sensitive. You should consider many things on your solution, such as things ending with ! and ? are also sentences (but sometimes they aren't)

This is a sentence (ha?) or do you think (!) so?

is going to be splitted as

  • This is a sentence (ha
  • ) or do you think (
  • ) so
小梨窩很甜 2024-11-24 23:29:46

我对此没有太多经验,但您可能正在寻找 nltk

尝试这个;使用 span_tokenize 找到您的单词索引属于哪个范围,然后查找该句子。

I don't have much experience with this but you might be looking for nltk.

Try this; use span_tokenize and find which span the index of your word falls under, then look that sentence up.

呆橘 2024-11-24 23:29:46

使用grep或egrep命令与python的子进程模块,它可能会帮助你。

例如:

from subprocess import Popen, PIPE

stdout = Popen("grep 'word1' document.txt", shell=True, stdout=PIPE).stdout
#to search 2 different words: stdout = Popen("egrep 'word1|word2' document.txt",       
#shell=True, #stdout=PIPE).stdout
data = stdout.read()
data.split('\n')

use grep or egrep commands with subprocess module of python, it may help you.

e.g:

from subprocess import Popen, PIPE

stdout = Popen("grep 'word1' document.txt", shell=True, stdout=PIPE).stdout
#to search 2 different words: stdout = Popen("egrep 'word1|word2' document.txt",       
#shell=True, #stdout=PIPE).stdout
data = stdout.read()
data.split('\n')
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文