匹配文件对象中的多行正则表达式
如何从文件对象 (data.txt) 中提取此正则表达式中的组?
import numpy as np
import re
import os
ifile = open("data.txt",'r')
# Regex pattern
pattern = re.compile(r"""
^Time:(\d{2}:\d{2}:\d{2}) # Time: 12:34:56 at beginning of line
\r{2} # Two carriage return
\D+ # 1 or more non-digits
storeU=(\d+\.\d+)
\s
uIx=(\d+)
\s
storeI=(-?\d+.\d+)
\s
iIx=(\d+)
\s
avgCI=(-?\d+.\d+)
""", re.VERBOSE | re.MULTILINE)
time = [];
for line in ifile:
match = re.search(pattern, line)
if match:
time.append(match.group(1))
代码最后一部分的问题是我逐行迭代,这显然不适用于多行正则表达式。我尝试像这样使用 pattern.finditer(ifile)
:
for match in pattern.finditer(ifile):
print match
...只是为了看看它是否有效,但 finditer 方法需要字符串或缓冲区。
我也尝试过这个方法,但无法让它发挥作用,
matches = [m.groups() for m in pattern.finditer(ifile)]
有什么想法吗?
在 Mike 和 Tuomas 发表评论后,我被告知要使用 .read() .. 像这样:
ifile = open("data.txt",'r').read()
这工作正常,但这是否是搜索文件的正确方法?无法让它工作...
for i in pattern.finditer(ifile):
match = re.search(pattern, i)
if match:
time.append(match.group(1))
解决方案
# Open file as file object and read to string
ifile = open("data.txt",'r')
# Read file object to string
text = ifile.read()
# Close file object
ifile.close()
# Regex pattern
pattern_meas = re.compile(r"""
^Time:(\d{2}:\d{2}:\d{2}) # Time: 12:34:56 at beginning of line
\n{2} # Two newlines
\D+ # 1 or more non-digits
storeU=(\d+\.\d+) # Decimal-number
\s
uIx=(\d+) # Fetch uIx-variable
\s
storeI=(-?\d+.\d+) # Fetch storeI-variable
\s
iIx=(\d+) # Fetch iIx-variable
\s
avgCI=(-?\d+.\d+) # Fetch avgCI-variable
""", re.VERBOSE | re.MULTILINE)
file_times = open("output_times.txt","w")
for match in pattern_meas.finditer(text):
output = "%s,\t%s,\t\t%s,\t%s,\t\t%s,\t%s\n" % (match.group(1), match.group(2), match.group(3), match.group(4), match.group(5), match.group(6))
file_times.write(output)
file_times.close()
也许它可以写得更紧凑和Pythonic....
How can I extract the groups from this regex from a file object (data.txt)?
import numpy as np
import re
import os
ifile = open("data.txt",'r')
# Regex pattern
pattern = re.compile(r"""
^Time:(\d{2}:\d{2}:\d{2}) # Time: 12:34:56 at beginning of line
\r{2} # Two carriage return
\D+ # 1 or more non-digits
storeU=(\d+\.\d+)
\s
uIx=(\d+)
\s
storeI=(-?\d+.\d+)
\s
iIx=(\d+)
\s
avgCI=(-?\d+.\d+)
""", re.VERBOSE | re.MULTILINE)
time = [];
for line in ifile:
match = re.search(pattern, line)
if match:
time.append(match.group(1))
The problem in the last part of the code, is that I iterate line by line, which obviously doesn't work with multiline regex. I have tried to use pattern.finditer(ifile)
like this:
for match in pattern.finditer(ifile):
print match
... just to see if it works, but the finditer method requires a string or buffer.
I have also tried this method, but can't get it to work
matches = [m.groups() for m in pattern.finditer(ifile)]
Any idea?
After comment from Mike and Tuomas, I was told to use .read().. Something like this:
ifile = open("data.txt",'r').read()
This works fine, but would this be the correct way to search through the file? Can't get it to work...
for i in pattern.finditer(ifile):
match = re.search(pattern, i)
if match:
time.append(match.group(1))
Solution
# Open file as file object and read to string
ifile = open("data.txt",'r')
# Read file object to string
text = ifile.read()
# Close file object
ifile.close()
# Regex pattern
pattern_meas = re.compile(r"""
^Time:(\d{2}:\d{2}:\d{2}) # Time: 12:34:56 at beginning of line
\n{2} # Two newlines
\D+ # 1 or more non-digits
storeU=(\d+\.\d+) # Decimal-number
\s
uIx=(\d+) # Fetch uIx-variable
\s
storeI=(-?\d+.\d+) # Fetch storeI-variable
\s
iIx=(\d+) # Fetch iIx-variable
\s
avgCI=(-?\d+.\d+) # Fetch avgCI-variable
""", re.VERBOSE | re.MULTILINE)
file_times = open("output_times.txt","w")
for match in pattern_meas.finditer(text):
output = "%s,\t%s,\t\t%s,\t%s,\t\t%s,\t%s\n" % (match.group(1), match.group(2), match.group(3), match.group(4), match.group(5), match.group(6))
file_times.write(output)
file_times.close()
Maybe it can be written more compact and pythonic though....
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
您可以使用 ifile.read() 将文件对象中的数据读取到字符串中
You can read the data from the file object into a string with
ifile.read()
finditer
产量MatchObjects
< /a>.如果正则表达式不匹配任何内容,times
将是一个空列表。您还可以修改正则表达式以对
storeU
、storeI
、iIx
和avgCI
使用非捕获组,然后pattern.findall
将仅包含匹配的时间。注意:命名变量
time
可能会影响标准库模块。times
会是一个更好的选择。finditer
yieldMatchObjects
. If the regex doesn't match anythingtimes
will be an empty list.You can also modify your regex to use non-capturing groups for
storeU
,storeI
,iIx
andavgCI
, thenpattern.findall
will contain only matched times.Note: naming variable
time
might shadow standard library module.times
would be a better option.为什么不使用将整个文件读入缓冲区
,然后进行搜索?
Why don't you read the whole file into a buffer using
and then do a search with that?