匹配文件对象中的多行正则表达式

发布于 2024-08-25 10:35:19 字数 2663 浏览 2 评论 0原文

如何从文件对象 (data.txt) 中提取此正则表达式中的组?

import numpy as np
import re
import os
ifile = open("data.txt",'r')

# Regex pattern
pattern = re.compile(r"""
                ^Time:(\d{2}:\d{2}:\d{2})   # Time: 12:34:56 at beginning of line
                \r{2}                       # Two carriage return
                \D+                         # 1 or more non-digits
                storeU=(\d+\.\d+)
                \s
                uIx=(\d+)
                \s
                storeI=(-?\d+.\d+)
                \s
                iIx=(\d+)
                \s
                avgCI=(-?\d+.\d+)
                """, re.VERBOSE | re.MULTILINE)

time = [];

for line in ifile:
    match = re.search(pattern, line)
    if match:
        time.append(match.group(1))

代码最后一部分的问题是我逐行迭代,这显然不适用于多行正则表达式。我尝试像这样使用 pattern.finditer(ifile)

for match in pattern.finditer(ifile):
    print match

...只是为了看看它是否有效,但 finditer 方法需要字符串或缓冲区。

我也尝试过这个方法,但无法让它发挥作用,

matches = [m.groups() for m in pattern.finditer(ifile)]

有什么想法吗?


在 Mike 和 Tuomas 发表评论后,我被告知要使用 .read() .. 像这样:

ifile = open("data.txt",'r').read()

这工作正常,但这是否是搜索文件的正确方法?无法让它工作...

for i in pattern.finditer(ifile):
    match = re.search(pattern, i)
    if match:
        time.append(match.group(1))

解决方案

# Open file as file object and read to string
ifile = open("data.txt",'r')

# Read file object to string
text = ifile.read()

# Close file object
ifile.close()

# Regex pattern
pattern_meas = re.compile(r"""
                ^Time:(\d{2}:\d{2}:\d{2})   # Time: 12:34:56 at beginning of line
                \n{2}                       # Two newlines
                \D+                         # 1 or more non-digits
                storeU=(\d+\.\d+)           # Decimal-number
                \s
                uIx=(\d+)                   # Fetch uIx-variable
                \s
                storeI=(-?\d+.\d+)          # Fetch storeI-variable
                \s
                iIx=(\d+)                   # Fetch iIx-variable
                \s
                avgCI=(-?\d+.\d+)           # Fetch avgCI-variable
                """, re.VERBOSE | re.MULTILINE)

file_times = open("output_times.txt","w")
for match in pattern_meas.finditer(text):
    output = "%s,\t%s,\t\t%s,\t%s,\t\t%s,\t%s\n" % (match.group(1), match.group(2), match.group(3), match.group(4), match.group(5), match.group(6))
    file_times.write(output)
file_times.close()

也许它可以写得更紧凑和Pythonic....

How can I extract the groups from this regex from a file object (data.txt)?

import numpy as np
import re
import os
ifile = open("data.txt",'r')

# Regex pattern
pattern = re.compile(r"""
                ^Time:(\d{2}:\d{2}:\d{2})   # Time: 12:34:56 at beginning of line
                \r{2}                       # Two carriage return
                \D+                         # 1 or more non-digits
                storeU=(\d+\.\d+)
                \s
                uIx=(\d+)
                \s
                storeI=(-?\d+.\d+)
                \s
                iIx=(\d+)
                \s
                avgCI=(-?\d+.\d+)
                """, re.VERBOSE | re.MULTILINE)

time = [];

for line in ifile:
    match = re.search(pattern, line)
    if match:
        time.append(match.group(1))

The problem in the last part of the code, is that I iterate line by line, which obviously doesn't work with multiline regex. I have tried to use pattern.finditer(ifile) like this:

for match in pattern.finditer(ifile):
    print match

... just to see if it works, but the finditer method requires a string or buffer.

I have also tried this method, but can't get it to work

matches = [m.groups() for m in pattern.finditer(ifile)]

Any idea?


After comment from Mike and Tuomas, I was told to use .read().. Something like this:

ifile = open("data.txt",'r').read()

This works fine, but would this be the correct way to search through the file? Can't get it to work...

for i in pattern.finditer(ifile):
    match = re.search(pattern, i)
    if match:
        time.append(match.group(1))

Solution

# Open file as file object and read to string
ifile = open("data.txt",'r')

# Read file object to string
text = ifile.read()

# Close file object
ifile.close()

# Regex pattern
pattern_meas = re.compile(r"""
                ^Time:(\d{2}:\d{2}:\d{2})   # Time: 12:34:56 at beginning of line
                \n{2}                       # Two newlines
                \D+                         # 1 or more non-digits
                storeU=(\d+\.\d+)           # Decimal-number
                \s
                uIx=(\d+)                   # Fetch uIx-variable
                \s
                storeI=(-?\d+.\d+)          # Fetch storeI-variable
                \s
                iIx=(\d+)                   # Fetch iIx-variable
                \s
                avgCI=(-?\d+.\d+)           # Fetch avgCI-variable
                """, re.VERBOSE | re.MULTILINE)

file_times = open("output_times.txt","w")
for match in pattern_meas.finditer(text):
    output = "%s,\t%s,\t\t%s,\t%s,\t\t%s,\t%s\n" % (match.group(1), match.group(2), match.group(3), match.group(4), match.group(5), match.group(6))
    file_times.write(output)
file_times.close()

Maybe it can be written more compact and pythonic though....

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

可遇━不可求 2024-09-01 10:35:19

您可以使用 ifile.read() 将文件对象中的数据读取到字符串中

You can read the data from the file object into a string with ifile.read()

伴随着你 2024-09-01 10:35:19
times = [match.group(1) for match in pattern.finditer(ifile.read())]

finditer 产量 MatchObjects< /a>.如果正则表达式不匹配任何内容,times 将是一个空列表。

您还可以修改正则表达式以对 storeUstoreIiIxavgCI 使用非捕获组,然后pattern.findall 将仅包含匹配的时间。

注意:命名变量 time 可能会影响标准库模块。 times 会是一个更好的选择。

times = [match.group(1) for match in pattern.finditer(ifile.read())]

finditer yield MatchObjects. If the regex doesn't match anything times will be an empty list.

You can also modify your regex to use non-capturing groups for storeU, storeI, iIx and avgCI, then pattern.findall will contain only matched times.

Note: naming variable time might shadow standard library module. times would be a better option.

小梨窩很甜 2024-09-01 10:35:19

为什么不使用将整个文件读入缓冲区

buffer = open("data.txt").read()

,然后进行搜索?

Why don't you read the whole file into a buffer using

buffer = open("data.txt").read()

and then do a search with that?

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文