正则表达式将查询字符串值解析为命名组

发布于 2024-07-09 00:38:29 字数 543 浏览 13 评论 0原文

我有一个包含以下内容的 HTML：

... some text ...
<a href="file.aspx?userId=123&section=2">link</a> ... some text ...
... some text ...
<a href="file.aspx?section=5&user=678">link</a> ... some text ...
... some text ...

我想解析它并获得与命名组的匹配：

匹配 1

group["user"]=123

group["section"]=2

match 2

group["user"]=678

group["section"]=5

如果参数总是按顺序排列，首先是用户，然后是部分，我可以做到这一点，但我不知道该怎么做如果顺序不同。

谢谢你！

原文

I have a HTML with the following content:

... some text ...
<a href="file.aspx?userId=123§ion=2">link</a> ... some text ...
... some text ...
<a href="file.aspx?section=5&user=678">link</a> ... some text ...
... some text ...

I would like to parse that and get a match with named groups:

match 1

group["user"]=123

group["section"]=2

match 2

group["user"]=678

group["section"]=5

I can do it if parameters always go in order, first User and then Section, but I don't know how to do it if the order is different.

Thank you!

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

寄居人 2024-07-16 00:38:29

就我而言，我必须解析 Url，因为实用程序 HttpUtility.ParseQueryString 在 WP7 中不可用。所以，我创建了一个像这样的扩展方法：

public static class UriExtensions
{
    private static readonly Regex queryStringRegex;
    static UriExtensions()
    {
        queryStringRegex = new Regex(@"[\?&](?<name>[^&=]+)=(?<value>[^&=]+)");
    }

    public static IEnumerable<KeyValuePair<string, string>> ParseQueryString(this Uri uri)
    {
        if (uri == null)
            throw new ArgumentException("uri");

        var matches = queryStringRegex.Matches(uri.OriginalString);
        for (int i = 0; i < matches.Count; i++)
        {
            var match = matches[i];
            yield return new KeyValuePair<string, string>(match.Groups["name"].Value, match.Groups["value"].Value);
        }
    }
}

然后使用它是问题，例如

        var uri = new Uri(HttpUtility.UrlDecode(@"file.aspx?userId=123§ion=2"),UriKind.RelativeOrAbsolute);
        var parameters = uri.ParseQueryString().ToDictionary( kvp => kvp.Key, kvp => kvp.Value);
        var userId = parameters["userId"];
        var section = parameters["section"];

注意： 我直接返回 IEnumerable 而不是字典，只是因为我假设可能有重复参数的名称。如果存在重复的名称，则字典将抛出异常。

In my case I had to parse an Url because the utility HttpUtility.ParseQueryString is not available in WP7. So, I created a extension method like this:

public static class UriExtensions
{
    private static readonly Regex queryStringRegex;
    static UriExtensions()
    {
        queryStringRegex = new Regex(@"[\?&](?<name>[^&=]+)=(?<value>[^&=]+)");
    }

    public static IEnumerable<KeyValuePair<string, string>> ParseQueryString(this Uri uri)
    {
        if (uri == null)
            throw new ArgumentException("uri");

        var matches = queryStringRegex.Matches(uri.OriginalString);
        for (int i = 0; i < matches.Count; i++)
        {
            var match = matches[i];
            yield return new KeyValuePair<string, string>(match.Groups["name"].Value, match.Groups["value"].Value);
        }
    }
}

Then It's matter of using it, for example

        var uri = new Uri(HttpUtility.UrlDecode(@"file.aspx?userId=123§ion=2"),UriKind.RelativeOrAbsolute);
        var parameters = uri.ParseQueryString().ToDictionary( kvp => kvp.Key, kvp => kvp.Value);
        var userId = parameters["userId"];
        var section = parameters["section"];

NOTE: I'm returning the IEnumerable instead of the dictionary directly just because I'm assuming that there might be duplicated parameter's name. If there are duplicated names, then the dictionary will throw an exception.

回复收藏 0 原文

呆° 2024-07-16 00:38:29

为什么要使用正则表达式来拆分它？

您可以首先提取查询字符串。将结果拆分为 & 然后通过将结果从=上分割来创建一个地图

回复收藏 0 原文

时光瘦了 2024-07-16 00:38:29

您没有指定您使用的语言，但这应该可以在 C# 中实现：

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Text.RegularExpressions;

namespace RegexTest
{
    class Program
    {
        static void Main(string[] args)
        {
            string subjectString = @"... some text ...
                <a href=""file.aspx?userId=123§ion=2"">link</a> ... some text ...
... some text ...
<a href=""file.aspx?section=5&user=678"">link</a> ... some text ...
... some text ...";
            Regex regexObj = 
               new Regex(@"<a href=""file.aspx\?(?:(?:userId=(?<user>.+?)§ion=(?<section>.+?)"")|(?:section=(?<section>.+?)&user=(?<user>.+?)""))");
            Match matchResults = regexObj.Match(subjectString);
            while (matchResults.Success)
            {
                string user = matchResults.Groups["user"].Value;
                string section = matchResults.Groups["section"].Value;
                Console.WriteLine(string.Format("User = {0}, Section = {1}", user, section));
                matchResults = matchResults.NextMatch();
            }
            Console.ReadKey();
        }
    }
}

You didn't specify what language you are working in, but this should do the trick in C#:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Text.RegularExpressions;

namespace RegexTest
{
    class Program
    {
        static void Main(string[] args)
        {
            string subjectString = @"... some text ...
                <a href=""file.aspx?userId=123§ion=2"">link</a> ... some text ...
... some text ...
<a href=""file.aspx?section=5&user=678"">link</a> ... some text ...
... some text ...";
            Regex regexObj = 
               new Regex(@"<a href=""file.aspx\?(?:(?:userId=(?<user>.+?)§ion=(?<section>.+?)"")|(?:section=(?<section>.+?)&user=(?<user>.+?)""))");
            Match matchResults = regexObj.Match(subjectString);
            while (matchResults.Success)
            {
                string user = matchResults.Groups["user"].Value;
                string section = matchResults.Groups["section"].Value;
                Console.WriteLine(string.Format("User = {0}, Section = {1}", user, section));
                matchResults = matchResults.NextMatch();
            }
            Console.ReadKey();
        }
    }
}

回复收藏 0 原文

行雁书 2024-07-16 00:38:29

使用正则表达式首先找到键值对，然后进行分割......似乎不对。

我对完整的正则表达式解决方案感兴趣。

任何人？

回复收藏 0 原文

苏佲洛 2024-07-16 00:38:29

看看这个

\<a\s+href\s*=\s*["'](?<baseUri>.+?)\?(?:(?<key>.+?)=(?<value>.+?)[&"'])*\s*\>

你可以得到类似 Groups["key"].Captures[i] & 的配对。组["value"].Captures[i]

Check this out

\<a\s+href\s*=\s*["'](?<baseUri>.+?)\?(?:(?<key>.+?)=(?<value>.+?)[&"'])*\s*\>

You can get pairs with something like Groups["key"].Captures[i] & Groups["value"].Captures[i]

回复收藏 0 原文

呆头 2024-07-16 00:38:29

也许是这样的（我对正则表达式很生疏，而且一开始也不擅长它们。未经测试）：（顺便

/href="[^?]*([?&](userId=(?<user>\d+))|section=(?<section>\d+))*"/

说一句，XHTML 格式错误； & 应该是 & 在属性中。）

Perhaps something like this (I am rusty on regex, and wasn't good at them in the first place anyway. Untested):

/href="[^?]*([?&](userId=(?<user>\d+))|section=(?<section>\d+))*"/

(By the way, the XHTML is malformed; & should be & in the attributes.)

回复收藏 0 原文

半夏半凉 2024-07-16 00:38:29

另一种方法是将捕获组放入前瞻中：

Regex r = new Regex(@"<a href=""file\.aspx\?" +
                    @"(?=[^""<>]*?user=(?<user>\w+))" +
                    @"(?=[^""<>]*?section=(?<section>\w+))";

如果只有两个参数，则没有理由比 Mike 和 strager 建议的基于交替的方法更喜欢这种方法。但是，如果您需要匹配三个参数，其他正则表达式将增长到当前长度的几倍，而这个正则表达式只需要另一次前瞻，就像两个现有的一样。

顺便说一句，与您对 Claus 的回答相反，您使用哪种语言非常重要。一种语言与另一种语言的功能、语法和 API 存在巨大差异。

Another approach is to put the capturing groups inside lookaheads:

Regex r = new Regex(@"<a href=""file\.aspx\?" +
                    @"(?=[^""<>]*?user=(?<user>\w+))" +
                    @"(?=[^""<>]*?section=(?<section>\w+))";

If there are only two parameters, there's no reason to prefer this way over the alternation-based approaches suggested by Mike and strager. But if you needed to match three parameters, the other regexes would grow to several times their current length, while this one would only need another lookahead like just like the two existing ones.

By the way, contrary to your response to Claus, it matters quite a bit which language you're working in. There's a huge variation in capabilities, syntax, and API from one language to the next.

回复收藏 0 原文

南渊 2024-07-16 00:38:29

您没有说明您正在使用哪种正则表达式。由于您的示例 URL 链接到 .aspx 文件，因此我假设是 .NET。在 .NET 中，单个正则表达式可以具有多个同名的命名捕获组，并且 .NET 会将它们视为一组。因此，您可以使用正则表达式

userID=(?<user>\d+)§ion=(?<section>\d+)|section=(?<section>\d+)&userID=(?<user>\d+)

这个带有交替的简单正则表达式将比任何带有环视的技巧更有效。如果您的要求包括仅匹配链接中的参数，您可以轻松扩展它。

You did not say which regex flavor you are using. Since your sample URL links to an .aspx file, I'll assume .NET. In .NET, a single regex can have multiple named capturing groups with the same name, and .NET will treat them as if they were one group. Thus you can use the regex

userID=(?<user>\d+)§ion=(?<section>\d+)|section=(?<section>\d+)&userID=(?<user>\d+)

This simple regex with alternation will be far more efficient than any tricks with lookaround. You can easily expand it if your requirements include matching the parameters only if they're in a link.

回复收藏 0 原文

抚你发端 2024-07-16 00:38:29

一个简单的Python实现克服了排序问题

In [2]: x = re.compile('(?:(userId|section)=(\d+))+')

In [3]: t = 'href="file.aspx?section=2&userId=123"'

In [4]: x.findall(t)
Out[4]: [('section', '2'), ('userId', '123')]

In [5]: t = 'href="file.aspx?userId=123§ion=2"'

In [6]: x.findall(t)
Out[6]: [('userId', '123'), ('section', '2')]

a simple python implementation overcoming the ordering problem

In [2]: x = re.compile('(?:(userId|section)=(\d+))+')

In [3]: t = 'href="file.aspx?section=2&userId=123"'

In [4]: x.findall(t)
Out[4]: [('section', '2'), ('userId', '123')]

In [5]: t = 'href="file.aspx?userId=123§ion=2"'

In [6]: x.findall(t)
Out[6]: [('userId', '123'), ('section', '2')]

回复收藏 0 原文

~没有更多了~