如何编写统一 diff 语法的解析器
我应该使用 RegexParsers、StandardTokenParsers 还是这些都适合解析这种语法?语法示例可以从此处找到。
Should I use RegexParsers, StandardTokenParsers or are these suitable at all for parsing this kind of syntax? Example of the syntax can be found from here.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
我会使用正则表达式。它简化了一些事情,并使其余的事情变得标准。
I'd use regex. It simplifies a few things, and makes the rest standard.
这种格式被设计为易于解析,您可以在没有任何正则表达式的情况下完成它,并且无需对输入进行标记。只需逐行查看前几个字符即可。文件头和块头需要更多的关注,但这并不是分割不能做的。
当然,如果你想学习如何使用一些解析库,那就去吧。
This format was designed to be easy to parse, you can do it without any regular expressions and without tokenizing your input. Just go line by line and look at the first couple of characters. The file header and chunks headers will require a little more attention, but it's nothing you can't do with split.
Of course, if you want to learn how to use some parsing libraries, then go for it.
这是使用 RegexParsers 的解决方案。
Here is a solution using
RegexParsers
.在寻找为 git diff 构建 Scala 解析器(通过运行 git diff-tree 生成)时偶然发现了这一点。这与统一差异非常相似,但它确实有一些有趣的变体。
我严重依赖上面的答案,并最终编写了此处包含的解析器。当然,这并不是严格意义上的原始海报所追求的,但我认为它对其他人可能有用。
Stumbled onto this while looking to build a Scala parser for a git diff, as generated by running
git diff-tree
. This is very similar to unified diff, but it does have a few interesting variants.I heavily relied on an answer above, and ended up writing the parser included here. It's not strictly what the original poster was after of course, but I figured it could be useful to others.