Python 多行正则表达式 +一次性读取一个文件的多个条目
//Last modified: Sat, Apr 16, 2011 09:55:04 AM
//Codeset: ISO-8859-1
fileInfo "version" "20x64";
createNode newnode -n "a_SET";
addAttr -ci true -k true -sn "connections" -ln "connections" -dt "string";
setAttr -l on -k off ".tx";
setAttr -l on -k off ".ty";
setAttr -l on -k off ".sz";
setAttr -l on -k on ".test1" -type "string" "blabla";
setAttr -l on -k on ".test2" -type "string" "blablabla";
createNode newnode -n "b_SET";
addAttr -ci true -k true -sn "connections" -ln "connections" -dt "string";
setAttr -l on -k off ".tx";
setAttr -l on -k off ".ty";
setAttr -l on -k off ".sz";
setAttr -l on -k on ".test1" -type "string" "hmm";
setAttr -l on -k on ".test2" -type "string" "ehmehm";
在Python中:
我需要读取新节点名称,例如“a_SET”和“b_SET”及其相应的属性值,因此{“a_SET”:{“test1”:“blabla”,“test2”:“blablabla”}和相同的对于 b_SET - 可能有未知数量的集合 - 例如 c_SET d_SET 等。
我尝试循环遍历行并在那里匹配它:
for line in fileopened:
setmatch = re.match( r'^(createNode set -n ")(.*)(_SET)(.*)' , line)
if setmatch:
sets.append(setmatch.group(2))
一旦我在这里找到匹配项,我就会循环遍历下一行以获得该集合的属性(test1,test2),直到我找到一个新集合 - 例如 c_SET 或 EOF。
使用 re.MULTILINE 一次性获取所有信息的最佳方式是什么?
//Last modified: Sat, Apr 16, 2011 09:55:04 AM
//Codeset: ISO-8859-1
fileInfo "version" "20x64";
createNode newnode -n "a_SET";
addAttr -ci true -k true -sn "connections" -ln "connections" -dt "string";
setAttr -l on -k off ".tx";
setAttr -l on -k off ".ty";
setAttr -l on -k off ".sz";
setAttr -l on -k on ".test1" -type "string" "blabla";
setAttr -l on -k on ".test2" -type "string" "blablabla";
createNode newnode -n "b_SET";
addAttr -ci true -k true -sn "connections" -ln "connections" -dt "string";
setAttr -l on -k off ".tx";
setAttr -l on -k off ".ty";
setAttr -l on -k off ".sz";
setAttr -l on -k on ".test1" -type "string" "hmm";
setAttr -l on -k on ".test2" -type "string" "ehmehm";
in Python:
I need to read the newnode names for instance "a_SET" and "b_SET" and their corresponding attribute values so {"a_SET": {"test1":"blabla", "test2":"blablabla"} and the same for the b_SET - there could be unknown amount of sets - like c_SET d_SET etc.
I've tried looping through lines and matching it there:
for line in fileopened:
setmatch = re.match( r'^(createNode set -n ")(.*)(_SET)(.*)' , line)
if setmatch:
sets.append(setmatch.group(2))
and as soon as I find a match here I would loop through next lines to get the attributes (test1, test2) for that set until I find a new set - for instance c_SET or an EOF.
What would be the best way to grab all that info in one go with the re.MULTILINE?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
您可以使用 正则表达式正向预测 来拆分组:
在您的示例中:
如果您想要字典上的结果,您可以
在 findall 之后使用:
You can use regexp positive lookahead to split the groups:
In your example:
If you want the results on a dict, you can use:
after findall
我得到了这个:
结果
。
问题:
如果字符串中必须有字符
'"'
怎么办?它是如何表示的?。
编辑
我很难找到解决方案,因为我没有选择该设施。
这是一个新模式捕获出现在字符串
" setAttr"
之后和之前的第一个字符串"..."
和最后一个字符串"..."
next" setAttr"
. 因此可以存在多个"..."
,而不仅仅是 3 个。您没有问这个条件,但我认为它可能恰好是 。我还设法使字符串中存在换行符以捕获
"....\n......"
,不仅在它们周围,我有义务 为我发明一些新东西:(?:\n(?! *setAttr)|[^"\n])
这意味着:除'"'
之外的所有字符和常见的newlines \n
,都被接受,并且仅接受后面没有以' *setAttr'
开头的行的换行符For
(?:\n( ?! *setAttr)|.)
它的意思是:换行符后面没有以' *setAttr'
和所有其他非换行符开头的行。因此,任何其他特殊序列(如制表符或其他任何序列)都会在匹配中自动接受。
结果
I got this:
result
.
Question:
what if there must be character
'"'
in the strings ? How is it represented ?.
EDIT
I had some difficulty to find the solution because I didn't choose the facility.
Here's a new pattern that catches the FIRST string
"..."
and the LAST string"..."
present after a string" setAttr"
and before the next" setAttr"
. So several"..."
can be present , not only 3. You didn't asked this condition, but I thought it may happen to be needed.I also managed to make possible the presence of newlines in the strings to catch
"....\n......"
, not only around them. For that , I was obliged to invent something new for me:(?:\n(?! *setAttr)|[^"\n])
that means : all characters, except'"'
and commonnewlines \n
, are accepted and also only the newlines that are not followed by a line beginning with' *setAttr'
For
(?:\n(?! *setAttr)|.)
it means : newlines not followed by a line beginning with' *setAttr'
and all the other non-newline characters.Hence, any other special sequence as tab or whatever else are automatically accpted in the matchings.
result
另一个可能的选择:
正如您所看到的 ".test1" 值现在用 /n 行分隔符分隔。您将如何使用 eyquem 的方法来解决这个问题?
Another possible option:
So as you can see ".test1" value is now split with a /n line separator. How would you go around that using eyquem's approach?