在 python 中导航文本文件搜索
这是我正在使用的文本文件的示例:
<Opera>
Tristan/NNP
and/CC
Isolde/NNP
and/CC
the/DT
fatalistic/NN
horns/VBZ
The/DT
passionate/JJ
violins/NN
And/CC
ominous/JJ
clarinet/NN
;/:
正斜杠后面的大写字母是奇怪的标签。我希望能够在文件中搜索诸如 "NNP,CC,NNP"
之类的内容,并让程序返回此段 "Tristan and Isolde"
(这三个词)与这三个标签相匹配的一行。
我遇到的问题是我希望用户输入搜索字符串,因此它总是不同的。
我可以读取文件并找到一个匹配项,但我不知道如何从该点向后计数以打印第一个单词或如何查找下一个标签是否匹配。
here is sample of the text file I am working with:
<Opera>
Tristan/NNP
and/CC
Isolde/NNP
and/CC
the/DT
fatalistic/NN
horns/VBZ
The/DT
passionate/JJ
violins/NN
And/CC
ominous/JJ
clarinet/NN
;/:
The capital letters after the forward slashes are weird tags. I want to be able to search the file for something like "NNP,CC,NNP"
and have the program return for this segment "Tristan and Isolde"
, the three words in a row that match those three tags in a row.
The problem I am having is I want the search string to be user inputed so it will always be different.
I can read the file and find one match but I do not know how to count backwards from that point to print the first word or how to find whether the next tag matches.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
从要搜索的标签列表动态构建正则表达式:
Build a regular expression dynamically from a list of tags you want to search:
同样,您可以做您需要的事情。
编辑:更普遍。
Similarly, you can do what you need.
EDIT: More generalized.
看来您的源文本可能是由 自然语言工具包 (nltk)。
使用 nltk,您可以对文本进行标记,将标记拆分为 (word,part_of_speech) 元组,然后迭代 ngram 以查找与模式匹配的内容:
产生
相关链接:
It appears your source text was possibly produced by Natural Language Toolkit (nltk).
Using nltk, you could tokenize the text, split the token into (word, part_of_speech) tuples, and iterate through ngrams to find those that match the pattern:
yields
Related link: