如何在Python中的多行字符串上使用正则表达式向后搜索
我想知道是否有一种有效的方法来执行以下操作:
我有一个python脚本,将整个文件读为一个字符串。然后,鉴于感兴趣的令牌的位置,考虑到该令牌,我想找到线的开始的字符串索引。
file_str = read_file("foo.txt")
token_pos = re.search("token",file_str).start()
#this does not work, as str.rfind does not take regex, and you cannot specify re.M:
beginning_of_line = file_str.rfind("^",0,token_pos)
我可以使用贪婪的正则表达式来查找线路的最后一个开始,但这必须做很多次,所以我担心我不想阅读每次迭代的整个文件。有一个好方法吗?
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------》(---
)需要。这是我要做的一件事的更好示例:
file_str = """
{
blah {
{} {{} "string with unmatched }" }
}
}"""
我碰巧知道blah
的括号的开口位置在哪里。我需要在牙套之间获得界限(非包容)。因此,鉴于闭合支架的位置,我需要找到包含它的行的开始。我想做类似于反正正则的事情来找到它。当然,我可以写一个特殊的功能来做到这一点,但是我认为还有更多python-hish的方法。为了进一步使事情复杂化,我必须每个文件几次执行此操作,并且文件字符串可能会在迭代之间发生变化,因此预先索引也无法真正起作用...
I'm wondering if there's an efficient way of doing the following:
I have a python script that reads an entire file into a single string. Then, given the location of a token of interest, I'd like to find the string index of the beginning of the line given that token.
file_str = read_file("foo.txt")
token_pos = re.search("token",file_str).start()
#this does not work, as str.rfind does not take regex, and you cannot specify re.M:
beginning_of_line = file_str.rfind("^",0,token_pos)
I could use a greedy regex to find the last beginning of line, but this has to be done many times, so I'm concerned that I don't want to read the whole file on each iteration. Is there a good way to do this?
----------------- EDIT ----------------
I tried to post as simple of a question, but it looks like more details are required. Here's a better example of one of the things I'm trying to do:
file_str = """
{
blah {
{} {{} "string with unmatched }" }
}
}"""
I happen to know where the opening an closing positions of blah
's braces are. I need to get the lines between the braces (non-inclusive). So, given the position of the closing brace, I need to find the beginning of the line containing it. I'd like to do something akin to a reverse regex to find it. I can, of course, write a special function to do this, but I was thinking there would be some more python-ish way of going about it. To further complicate things, I would have to do this several times per file, and the file string can potentially change between iterations, so pre-indexing doesn't really work either...
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
不要只匹配关键字,而是匹配从行开头到关键字的所有内容。您可以使用
re.finditer()
docs 来获取一个迭代器,该迭代器在找到匹配项时不断产生匹配项。这给出了:
请注意,即使第一行包含两个
amet
,它也只会匹配一次,因为我们对.
进行了贪婪匹配,因此第一个amet
该行的 code> 由.*
消耗Instead of matching just the keyword, match everything from the start of the line to the keyword. You could use
re.finditer()
docs to get an iterator that keeps yielding matches as it finds them.Which gives:
Note that the first line gets matched only once even though it contains two
amet
s because we do a greedy match on.
so the firstamet
on the line is consumed by the.*
您不需要使用正则表达式来查找带有标记的行的开头
这将逐行迭代文件,使用文件的内容创建字符串 foo 并记录换行符在名为 line_pos_with_token 的列表中的位置
You don't need use regex to find the beginning of lines with the token
This will iterate the file line by line, create the string foo with the file's content and record where the newlines are in list named line_pos_with_token