按模式将文本脚本拆分为子字符串
考虑下面的脚本(这在伪语言中完全是无意义的):
if (Request.hostMatch("asfasfasf.com") && someString.existsIn(new String[] {"brr", "hrr"})) {
if (Requqest.clientIp("10.0.x.x")) {
somevar = "1";
}
somevar = "2";
}
else {
somevar = "first";
}
string foo = "foo";
// etc. etc.
你如何从中获取 if-block 的参数和内容? if 块的格式为:
if<whitespace>(<parameters>)<whitespace>{<contents>}<anything>
我尝试使用 String.split()
以及正则表达式模式 ^if\s*\(|\)\s*\{|\}\s *
但这惨败了。也就是说,问题在于 ) {
也出现在内部 if 块中,并且结束的 }
也出现在很多地方。我认为懒惰或急切的扩张在这里都不起作用。
那么...有什么指示可以指出我在这里可能需要什么才能使用正则表达式实现此功能吗?
我还需要获取没有 if 块代码的剩余字符串(因此代码从 else { ...
开始)。仅使用 String.split() 似乎会变得很困难,因为没有有关被解析的部分的长度的信息。
我最初为此创建了一个基于循环的解决方案(大量使用 String.substring()
),但它很乏味。我想要一些更奇特的东西。我应该使用正则表达式还是创建一个自定义的通用函数(除此之外还有很多其他情况),该函数采用可解析的字符串和模式(考虑 if
模式上面)?
编辑:更改了变量赋值的返回值,否则就没有意义。
Consider following script (it's total nonsense in pseudo-language):
if (Request.hostMatch("asfasfasf.com") && someString.existsIn(new String[] {"brr", "hrr"})) {
if (Requqest.clientIp("10.0.x.x")) {
somevar = "1";
}
somevar = "2";
}
else {
somevar = "first";
}
string foo = "foo";
// etc. etc.
How would you grab if-block's parameters and contents from it? The if-block has format of:
if<whitespace>(<parameters>)<whitespace>{<contents>}<anything>
I tried using String.split()
with regex pattern of ^if\s*\(|\)\s*\{|\}\s*
but this fails miserably. Namely, the problem is that ) {
is found also in inner if-block and the closing }
is found from many places as well. I don't think neither lazy or eager expansion works here.
So... any pointers to what might I need here in order to implement this with regex?
I also need to get the remaining string without the if-block's code (so code starting from else { ...
). Using just String.split()
seems to make it difficult as there is no information about the length of the parts that were parsed away.
I initially created a loop based solution (using String.substring()
heavily) for this, but it's dull. I would like to have something fancier instead. Should I go with regex or create a custom, generic function (there are many other cases than just this) that takes the parseable String and the pattern instead (consider the if<whitespace>(...
pattern above)?
Edit: Changed returns to variable assignments as it would have not made sense otherwise.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
使用(或编写)解析器比尝试使用正则表达式执行此操作要好得多。
正则表达式对于某些事情来说非常有用,但是对于像这样的复杂解析来说,它很糟糕。另一个在这里经常被问到的糟糕例子是解析 HTML——你可以在有限的程度上完成它,但对于任何复杂的事情,DOM 解析器是一个更好的解决方案。
对于[非常]简单的解析器,您需要一个递归函数来搜索大括号
{
和}
,每次遇到左大括号时都会向下递归一个级别,并在找到右大括号时返回上一级。然后,它需要存储每个级别的两个大括号之间的字符串内容。You'd be far better off using (or writing) a parser than trying to do this with Regex.
Regex is great for somethings, but for complex parsing like this, it sucks. Another example where it sucks that gets asked a lot here is parsing HTML - you can do it to a limited degree, but for anything complex, a DOM parser is a much better solution.
For a [very] simple parser, what you need is a recursive function that searches for a braces
{
and}
, recursing down a level each time it comes across an opening brace, and returning back up a level when it finds a closing brace. It then needs to store the string contents between the two braces at each level.常规语言不起作用,因为常规语法无法匹配诸如“任意数量的左括号后跟任意数量的右括号”。为此,需要上下文无关语法 。
除非您使用Java 的上下文无关语法解析器 或使正则表达式不再规则,基于循环的解决方案可能是最奇特的解决方案。
A regular language won't work because a regular grammar can't match things like "any number of open parenthesis followed by any number of close parenthesis". A context-free grammar would be needed for that.
Unless you use a context-free grammar parser for Java or a regular expression extension that makes regular expressions no longer regular, your loop-based solution is probably the fanciest solution.
如上所述,您需要一个解析器。一种易于实现(并且编写起来很有趣!)的类型是具有回溯功能的递归下降解析器。还有大量的解析器生成器,尽管其中大多数都有学习曲线。一种 Java 友好的解析器生成器是 JavaCC。
As per the above, you'll need a parser. One type that's easy to implement (and fun to write!) is a recursive descent parser with backtracking. There is also a plethora of parser generators out there, though most of those have a learning curve. One Java-friendly parser generator is JavaCC.