如何在Python中不使用正则表达式将文本格式与字符串匹配？

发布于 2024-11-01 05:13:09 字数 572 浏览 0 评论 0原文

我正在阅读一个文件，其中的行格式为

[ 0 ] L= 9 (D) R= 14 (D) p= 0.0347222 e= 10 n= 34

我看到 Matlab 代码读取这个文件，

[I,L,Ls,R,Rs,p,e,n] = textread(f1,'[ %u ] L= %u%s R= %u%s p= %n e=%u n=%u')

我想用Python读取这个文件。我唯一知道的是正则表达式，即使阅读这一行的一部分也会导致类似

re.compile('\s*\[\s*(?P<id>\d+)\s*\]\s*L\s*=\s*(?P<Lint>\d+)\s*\((?P<Ltype>[DG])\)\s*R\s*=\s*(?P<Rint>\d+)\s*')

丑陋的事情！在 Python 中是否有更简单的方法来做到这一点？

原文

I am reading a file with lines of the form exemplified by

[ 0 ] L= 9 (D) R= 14 (D) p= 0.0347222 e= 10 n= 34

I saw Matlab code to read this file given by

[I,L,Ls,R,Rs,p,e,n] = textread(f1,'[ %u ] L= %u%s R= %u%s p= %n e=%u n=%u')

I want to read this file in Python. The only thing I know of is regex, and reading even a part of this line leads to something like

re.compile('\s*\[\s*(?P<id>\d+)\s*\]\s*L\s*=\s*(?P<Lint>\d+)\s*\((?P<Ltype>[DG])\)\s*R\s*=\s*(?P<Rint>\d+)\s*')

which is ugly! Is there an easier way to do this in Python?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

风蛊 2024-11-08 05:13:10

您可以通过使用转义/替换构建正则表达式，使其更具可读性...

number = "([-+0-9.DdEe ]+)"
unit = r"\(([^)]+)\)"
t = "[X] L=XU R=XU p=X e=X n=X"
m = re.compile(re.escape(t).replace("X", number).replace("U", unit))

You can make the regexp more readable by building it with escape/replace...

number = "([-+0-9.DdEe ]+)"
unit = r"\(([^)]+)\)"
t = "[X] L=XU R=XU p=X e=X n=X"
m = re.compile(re.escape(t).replace("X", number).replace("U", unit))

回复收藏 0 原文

江湖正好 2024-11-08 05:13:10

对我来说，这看起来或多或少是Pythonic的：

line = "[ 0 ] L= 9 (D) R= 14 (D) p= 0.0347222 e= 10 n= 34"

parts = (None, int, None,
         None, int, str,
         None, int, str,
         None, float,
         None, int,
         None, int)

[I,L,Ls,R,Rs,p,e,n] = [f(x) for f, x in zip(parts, line.split()) if f is not None]

print [I,L,Ls,R,Rs,p,e,n]

This looks more or less pythonic to me:

line = "[ 0 ] L= 9 (D) R= 14 (D) p= 0.0347222 e= 10 n= 34"

parts = (None, int, None,
         None, int, str,
         None, int, str,
         None, float,
         None, int,
         None, int)

[I,L,Ls,R,Rs,p,e,n] = [f(x) for f, x in zip(parts, line.split()) if f is not None]

print [I,L,Ls,R,Rs,p,e,n]

回复收藏 0 原文

岁月无声 2024-11-08 05:13:10

Pyparsing 是不可读且脆弱的正则表达式处理器的后备方案。下面的解析器示例处理您指定的格式，加上任何类型的额外空格以及赋值表达式的任意顺序。正如您在正则表达式中使用命名组一样，pyparsing 支持结果名称，以便您可以使用 dict 或属性语法（data['Lint'] 或 data.Lint）访问解析的数据。

from pyparsing import Suppress, Word, nums, oneOf, Regex, ZeroOrMore, Optional

# define basic punctuation
EQ,LPAR,RPAR,LBRACK,RBRACK = map(Suppress,"=()[]")

# numeric values
integer = Word(nums).setParseAction(lambda t : int(t[0]))
real = Regex(r"[+-]?\d+\.\d*").setParseAction(lambda t : float(t[0]))

# id and assignment fields
idRef = LBRACK + integer("id") + RBRACK
typesep = LPAR + oneOf("D G") + RPAR
lExpr = 'L' + EQ + integer("Lint")
rExpr = 'R' + EQ + integer("Rint")
pExpr = 'p' + EQ + real("pFloat")
eExpr = 'e' + EQ + integer("Eint")
nExpr = 'n' + EQ + integer("Nint")

# accept assignments in any order, with or without leading (D) or (G)
assignment = lExpr | rExpr | pExpr | eExpr | nExpr
line = idRef + lExpr + ZeroOrMore(Optional(typesep) + assignment)


# test the parser
text = "[ 0 ] L= 9 (D) R= 14 (D) p= 0.0347222 e= 10 n= 34"
data = line.parseString(text)
print data.dump()


# prints
# [0, 'L', 9, 'D', 'R', 14, 'D', 'p', 0.034722200000000002, 'e', 10, 'n', 34]
# - Eint: 10
# - Lint: 9
# - Nint: 34
# - Rint: 14
# - id: 0
# - pFloat: 0.0347222

此外，解析操作在解析时执行 string->int 或 string->float 转换，以便之后这些值已经处于可用的形式。（pyparsing 的想法是，在解析这些表达式时，您知道由数字组成的单词 - 或 Word(nums) - 将安全地转换为 int，所以为什么不正确进行转换然后，不再只是返回匹配的字符串并重新处理字符串序列，而是尝试检测哪些是整数、浮点数等？）

Pyparsing is a fallback from unreadable and fragile regex processors. The parser example below handles your stated format, plus any variety of extra whitespace, and arbitrary order of the assignment expressions. Just as you have used named groups in your regex, pyparsing supports results names, so that you can access the parsed data using dict or attribute syntax (data['Lint'] or data.Lint).

from pyparsing import Suppress, Word, nums, oneOf, Regex, ZeroOrMore, Optional

# define basic punctuation
EQ,LPAR,RPAR,LBRACK,RBRACK = map(Suppress,"=()[]")

# numeric values
integer = Word(nums).setParseAction(lambda t : int(t[0]))
real = Regex(r"[+-]?\d+\.\d*").setParseAction(lambda t : float(t[0]))

# id and assignment fields
idRef = LBRACK + integer("id") + RBRACK
typesep = LPAR + oneOf("D G") + RPAR
lExpr = 'L' + EQ + integer("Lint")
rExpr = 'R' + EQ + integer("Rint")
pExpr = 'p' + EQ + real("pFloat")
eExpr = 'e' + EQ + integer("Eint")
nExpr = 'n' + EQ + integer("Nint")

# accept assignments in any order, with or without leading (D) or (G)
assignment = lExpr | rExpr | pExpr | eExpr | nExpr
line = idRef + lExpr + ZeroOrMore(Optional(typesep) + assignment)


# test the parser
text = "[ 0 ] L= 9 (D) R= 14 (D) p= 0.0347222 e= 10 n= 34"
data = line.parseString(text)
print data.dump()


# prints
# [0, 'L', 9, 'D', 'R', 14, 'D', 'p', 0.034722200000000002, 'e', 10, 'n', 34]
# - Eint: 10
# - Lint: 9
# - Nint: 34
# - Rint: 14
# - id: 0
# - pFloat: 0.0347222

Also, the parse actions do the string->int or string->float conversion at parse time, so that afterward the values are already in a usable form. (The thinking in pyparsing is that, while parsing these expressions, you know that a word composed of numeric digits - or Word(nums) - will safely convert to an int, so why not do the conversion right then, instead of just getting back matching strings and having to re-process the sequence of strings, trying to detect which ones are integers, floats, etc.?)

回复收藏 0 原文