C# 中的 ASPX 属性正则表达式解析

发布于 2024-12-15 06:34:31 字数 790 浏览 1 评论 0原文

我需要使用正则表达式在 ASPX 文件中查找属性值。

这意味着您无需担心格式错误的 HTML 或任何 HTML 相关问题。

我需要找到特定属性(LocText)的值。我想得到引号里面的内容。 值内的任何 ASPX 标记(例如 <%=、<%#、<%$ 等)对此属性没有意义,因此被视为其一部分。

我开始使用的正则表达式看起来像这样:

LocText="([^"]+)"

这很好用,第一组,即结果文本,获取除双引号之外的所有内容,双引号是不允许的(&quot ; 必须改为使用)

但是 ASPX 文件允许使用单引号 - 然后必须应用第二个正则表达式。

LocText='([^']+)'

我可以使用这两个正则表达式,但我正在寻找一种连接它们的方法。

LocText=("([^"]+)"|'([^']+)')

这也有效,但似乎效率不高,因为它创建了不必要数量的组。我认为这可以通过使用反向引用来完成,但我无法让它工作。

LocText=(["']{1})([^\1]+)\1

我认为这样,我将单/双引号保存到第一组,然后告诉它读取不是在第一组中找到的字符的任何内容。这再次被第一组的引用所包含。显然,我错了,而且它不是这样工作的。

有没有什么办法,如何将前两个表达式连接在一起,创建最少数量的组,其中一组是我想要获取的属性的值?是否可以对单/双引号值使用反向引用,或者我完全误解了它们的含义?

I need to find attribute values in an ASPX file using regular expressions.

That means you don't need to worry about malformed HTML or any HTML related issues.

I need to find the value of a particular attribute (LocText). I want to get what's inside the quotes.
Any ASPX tags such as <%=, <%#, <%$ etc. inside the value don't make sense for this attribute therefore are considered as part of it.

The regex I began with looks like this:

LocText="([^"]+)"

This works great, the first group, which is the result text, gets everything except the double quotes, which are not allowed there (" ; must be used instead)

But the ASPX file allows using of single quotes - second regular expression must be applied then.

LocText='([^']+)'

I could use these two regular expressions but I'm looking for a way to connect them.

LocText=("([^"]+)"|'([^']+)')

This also works but doesn't seem very efficient as it's creating unnecessary number of groups. I think this could be somehow done by using backreferences, but I can't get it to work.

LocText=(["']{1})([^\1]+)\1

I thought that by this, I save the single/double quote to the first group and then I tell it to read anything that is NOT the char found in the first group. This is enclosed again by the quote from the first group. Obviously, I'm wrong and it's not working like that.

Is there any way, how to connect the first two expressions together creating just a minimum amount of groups with one group being the value of the attribute I want to get? Is it possible using a backreference for the single/double quote value, or have I completely misunderstood the meaning of them?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

撩人痒 2024-12-22 06:34:31

我想说你的交替解决方案并没有那么糟糕,但是你可以使用 命名捕获 因此结果将始终在同一组的值中找到:

Regex regexObj = new Regex(@"LocText=(?:""(?<attr>[^""]+)""|'(?<attr>[^']+)')");
resultString = regexObj.Match(subjectString).Groups["attr"].Value;

解释:

LocText=          # Match LocText=
(?:               # Either match
 "(?<attr>[^"]+)" # "...", capture in named group <attr>
|                 # or match
 '(?<attr>[^']+)' # '...', also capture in named group <attr>
)                 # End of alternation

另一种选择是使用 先行断言[^\1] 不起作用,因为您无法将反向引用放置在字符类中,但您可以在环视中使用它们):

Regex regexObj = new Regex(@"LocText=([""'])((?:(?!\1).)*)\1");
resultString = regexObj.Match(subjectString).Groups[2].Value;

说明:

LocText=   # Match LocText=
(["'])     # Match and capture (group 1) " or '
(          # Match and capture (group 2)...
 (?:       # Try to match...
  (?!\1)   # (unless it's the quote character we matched before)
  .        # any character
 )*        # repeat any number of times
)          # End of capturing group 2
\1         # Match the previous quote character

I'd say your solution with alternation isn't that bad, but you could use named captures so the result will always be found in the same group's value:

Regex regexObj = new Regex(@"LocText=(?:""(?<attr>[^""]+)""|'(?<attr>[^']+)')");
resultString = regexObj.Match(subjectString).Groups["attr"].Value;

Explanation:

LocText=          # Match LocText=
(?:               # Either match
 "(?<attr>[^"]+)" # "...", capture in named group <attr>
|                 # or match
 '(?<attr>[^']+)' # '...', also capture in named group <attr>
)                 # End of alternation

Another option would be to use lookahead assertions ([^\1] isn't working because you can't place backreferences inside a character class, but you can use them in lookarounds):

Regex regexObj = new Regex(@"LocText=([""'])((?:(?!\1).)*)\1");
resultString = regexObj.Match(subjectString).Groups[2].Value;

Explanation:

LocText=   # Match LocText=
(["'])     # Match and capture (group 1) " or '
(          # Match and capture (group 2)...
 (?:       # Try to match...
  (?!\1)   # (unless it's the quote character we matched before)
  .        # any character
 )*        # repeat any number of times
)          # End of capturing group 2
\1         # Match the previous quote character
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文