ANTLR 还是正则表达式?
我正在用 ASP.NET/C# 编写一个 CMS,我需要处理类似的事情,每个页面请求:
<html>
<head>
<title>[Title]</title>
</head>
<body>
<form action="[Action]" method="get">
[TextBox Name="Email", Background=Red]
[Button Type="Submit"]
</form>
</body>
</html>
当然还要替换 [...]。
我的问题是我应该如何实现它,使用 ANTLR 还是使用 Regex?什么会更快?请注意,如果我使用 ANTLR 实现它,我认为我将需要在 [..] 的附加组件中实现 XML。
我需要实现参数等。
编辑:请注意,我的正则表达式甚至可以看起来像这样:
public override string ToString()
{
return Regex.Replace(Input, @"\[
\s*(?<name>\w+)\s*
(?<parameter>
[\s,]*
(?<paramName>\w+)
\s*
=
\s*
(
(?<paramValue>\w+)
|
(""(?<paramValue>[^""]*)"")
)
)*
\]", (match) =>
{
...
}, RegexOptions.IgnorePatternWhitespace);
}
I'm writing a CMS in ASP.NET/C#, and I need to process things like that, every page request:
<html>
<head>
<title>[Title]</title>
</head>
<body>
<form action="[Action]" method="get">
[TextBox Name="Email", Background=Red]
[Button Type="Submit"]
</form>
</body>
</html>
and replace the [...] of course.
My question is how should I implement it, with ANTLR or with Regex? What will be faster? Note, that if I'm implementing it with ANTLR I think that I will need to implement XML, in addon to the [..].
I will need to implement parameters, etc.
EDIT: Please note that my regex can even look like something like that:
public override string ToString()
{
return Regex.Replace(Input, @"\[
\s*(?<name>\w+)\s*
(?<parameter>
[\s,]*
(?<paramName>\w+)
\s*
=
\s*
(
(?<paramValue>\w+)
|
(""(?<paramValue>[^""]*)"")
)
)*
\]", (match) =>
{
...
}, RegexOptions.IgnorePatternWhitespace);
}
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
正确的工具是 RegEx 还是 ANTLR,甚至完全是其他工具,应该在很大程度上取决于您的要求。 “使用什么工具”问题的最佳答案不应主要基于性能,而应基于适合工作的正确工具。
RegEx 是一个文本搜索工具。如果您需要做的就是将琴弦从琴弦中拉出来,那么通常会选择锤子。您可能需要一个工具来帮助您构建正则表达式。我推荐 Expresso,但是有很多选择。
ANTLR 是一个编译器生成器。如果您需要错误消息和解析操作或编译器附带的任何复杂的东西,那么这是一个不错的选择。
您正在做的是 XML 搜索/替换,您考虑过 XPath 吗?这就是我的建议。
为工作选择正确的工具绝对重要,在开发开始之前应该研究和思考这一点。在所有情况下,在做出任何决定之前充分了解计划要求非常重要。您有该项目的规格吗?如果没有,花时间想出一个工具将为您节省所有时间,而选择糟糕的工具可能会让您付出代价。
希望有帮助!
Whether the correct tool is RegEx or ANTLR or even something else entirely should be heavily dependent on your requirements. The best answer to a "what tool to use" question shouldn't be primarily based on performance, but on the right tool for the job.
RegEx is a text search tool. If all you need to do is pull strings out of strings then it's often the hammer of choice. You'll likely want a tool to help you build your RegEx. I'd recommend Expresso, but there are lots of options out there.
ANTLR is a compiler generator. If you need error messages and parse actions or any of the complicated things that come with a compiler then it's a good option.
What it looks like you're doing is XML search/replace, have you considered XPath? That would be my suggestion.
Choosing the right tool for the job is definitely important, something that should be researched and thought out before development begins. In all cases, it's important to fully understand the program requirements before making any decisions. Do you have a specification for the project? If not, spending the time to come up with one will save you all the time that a poor tool choice can cost you.
Hope that helps!
ANTLR 与 RegEx 的性能取决于 RegEx 在 C# 中的实现。根据经验,我知道 ANTLR 足够快。
在 ANTLR 中,您可以忽略某些内容,例如 XML。您还可以查找
[
和]
并进一步进行处理。RegEx 和 ANTLR 都支持您的参数类型(我不确定“等”)。
就开发速度而言:对于像这样的情况,RegEx 稍微快一些。您可以使用在线工具来开发正则表达式并在编辑正则表达式时查看捕获组。 (Google @
regex gskinner
)然后 ANTLR 对“错误消息”有完美的支持:它们显示行/列号以及错误所在。正则表达式没有这种支持。
正则表达式的一般方法是:创建一个“全局扫描”正则表达式,它将在您的内容中找到正确的
[
...]
组。然后让“...”被一组捕获,然后为这个较小的内容应用另一个正则表达式(它根据等号和逗号分割内容)。这样您就可以获得最佳的运行时性能并且易于开发。About the performance of ANTLR vs. RegEx depends on the implementation of RegEx in C#. I know, from experience, that ANTLR is fast enough.
In ANTLR you can ignore certain content, like the XML. You can also seek for the
[
and]
and go further with processing.Both RegEx and ANTLR are supporting your kind of parameters (the "etc." I'm not sure about).
In terms of development speed: RegEx is slightly faster for such a case like this. You can use an online tool to develop the RegEx and see the capture-groups while you edit the RegEx. (Google @
regex gskinner
)Then ANTLR has perfect support for "error-messages": they show line/column numbers and what was wrong. RegEx doesn't have this support.
A general approach for RegEx would be: create a "global scan" RegEx which will find correct
[
...]
groups in your content. Then let the "..." be captuerd by a group, and then apply another RegEx for this smaller content (which splits content based on the equal-sign and commas). This way you have the best runtime performance and it's easy to develop.如果您正在解析的语言是正则,那么正则表达式肯定是一个选择。如果不是,那么 ANTLR 可能是您唯一的选择。如果我对这些问题的理解正确的话,XML 是不规则的。
If the language you are parsing is regular then regular expressions are certainly an option. If it is not then ANTLR may be your only choice. If I understand these matters correctly XML is not regular.