使用 python 解析 git - 日志文件
所以我需要解析这样的事情:
commit e397a6e988c05d6fd87ae904303ec0e17f4d79a2
Author: Name <[email protected]>
Date: Sat Jul 9 21:29:10 2011 +0400
commit message
1 files changed, 21 insertions(+), 11 deletions(-)
并获取作者姓名以及插入和删除的数量。
对于名称,我有这个:
re.findall(r"Author: (.+) <",gitLog)
对于数字,我有这个:
re.findall(r" (\d+) insertions\S+, (\d+) deletions",gitLog)
但我想使用一个正则表达式获取名称、插入和删除的元组列表。
我尝试做类似的事情
re.findall(r"Author: (.+) <.+ (\d+) insertions\S+, (\d+) deletions",gitLog,re.DOTALL)
,但它没有返回任何内容......
那么我的错误是什么?正则表达式应该是什么样子?
更新: wRAR 是正确的,但不知何故,当我读取文件并尝试解析它时,我将整个文件作为名称,然后最后插入和删除,因此它匹配整个文件,但不匹配单个提交... [.+]获取整个文件,但不是提交的一部分...
So i need to parse thing like this :
commit e397a6e988c05d6fd87ae904303ec0e17f4d79a2
Author: Name <[email protected]>
Date: Sat Jul 9 21:29:10 2011 +0400
commit message
1 files changed, 21 insertions(+), 11 deletions(-)
and get Author name and number of insertions and deletions.
For the name i have this:
re.findall(r"Author: (.+) <",gitLog)
For the numbers i have this:
re.findall(r" (\d+) insertions\S+, (\d+) deletions",gitLog)
But i want to get a list of tuples of name,insertions and delitions with one regular-expression.
I tryed to do somthing like
re.findall(r"Author: (.+) <.+ (\d+) insertions\S+, (\d+) deletions",gitLog,re.DOTALL)
but it returns nothing...
So what is my mistake? How regular-expression should look like?
UPADTE:
wRAR is right, but somehow when i read i file and try to parse it i get the whole file as a name , and then last insertion and deletion, so it matches the whole file but not a single commit... [.+] gets the whole file but not a part of a commit...
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
如果您有权访问存储库而不是 git log 的某些文本转储,则可以省去解析麻烦并生成不同的日志输出:
将生成以下形式的输出:
您甚至不需要正则表达式。如果您想保留正则表达式,则需要在插入后匹配
(+)
,否则它将根本不匹配并且不会捕获数字。If you have access to the repo and not some text dump of
git log
, save yourself the parsing trouble and generate different log output:Will produce output of the form:
Which you don't even need regex for. If you want to keep with regex, you need to match the
(+)
after insertions or else it will not match at all and not capture the numbers.您应该使用(直接或借用代码)现有的软件包,例如 GitPython,但是关于您的正则表达式问题,提供的文本的正则表达式返回
[('Name', '21', '11')]
所以我认为它是正确的。You should use (directly or by borrowing the code) existing packages such as GitPython, but about your regex question, the provided regex for the provided text returns
[('Name', '21', '11')]
so I suppose it is right.我用一个模块来用 Python 解析 Git 日志。看起来很生动:
https://github.com/gaborantal/git-log-parser
There is a module that I used for parsing Git log with Python. Looks quite living:
https://github.com/gaborantal/git-log-parser
所以我的问题的答案是:
但无论如何还是谢谢你的回答。
So the answer to my question is :
But thanks for you answers anyway.