如何提高 LINQ 查询的速度?

发布于 2025-01-02 11:28:06 字数 1441 浏览 0 评论 0原文

我之前使用的是 long.TryParse,但后来改用了正则表达式。目前,123+K 条消息总共需要 7+ 毫秒。 7+ 毫秒是从 XElement.Parse 到 foreach 方法结束的时间。

Stopwatch s1 =Stopwatch.StartNew();
XElement element = XElement.Parse(xml);    

string pattern = @"\b\d+\b";
Regex r = new Regex(pattern);

IEnumerable<XElement> elementsWithPossibleCCNumbers = element
    .Descendants()
    .Where(d => d.Attributes()
        .Where(a => a.Value.Length >= 13 &&
               a.Value.Length <= 16 &&
               r.IsMatch(a.Value)).Count() == 1)
    .Select(x => x);

foreach(var x in elementsWithPossibleCCNumbers)
{
    foreach(var a in x.Attributes())
    {
        //Check if the value is a number
        if(r.IsMatch(a.Value))
        {
            //Check if value is the credit card
            if(a.Value.Length >= 13 && a.Value.Length <= 16)
            {
                a.Value = Regex.Replace(a.Value, @"\b\d{13,16}\b", match => 
                    new String('*', match.Value.Length - 4) +
                    match.Value.Substring(match.Value.Length - 4)
                );
        }
        else //If value is not a credit card, replace it with ***
                a.Value = Regex.Replace(a.Value ,@"\b\d+\b", "***");
        }
    }
}

xml = element.ToString();
s1.Stop();

XElement.Parse(xml); 需要 2 - 3 毫秒。

LINQ 查询需要 0.004 - 0.005 毫秒。

foreach 语句需要 4 - 5 毫秒。

I was using long.TryParse, but switched to regex. Currently, it takes a total of 7+ milliseconds for a 123+K message. The 7+ milliseconds is from the XElement.Parse to the end of the foreach methods.

Stopwatch s1 =Stopwatch.StartNew();
XElement element = XElement.Parse(xml);    

string pattern = @"\b\d+\b";
Regex r = new Regex(pattern);

IEnumerable<XElement> elementsWithPossibleCCNumbers = element
    .Descendants()
    .Where(d => d.Attributes()
        .Where(a => a.Value.Length >= 13 &&
               a.Value.Length <= 16 &&
               r.IsMatch(a.Value)).Count() == 1)
    .Select(x => x);

foreach(var x in elementsWithPossibleCCNumbers)
{
    foreach(var a in x.Attributes())
    {
        //Check if the value is a number
        if(r.IsMatch(a.Value))
        {
            //Check if value is the credit card
            if(a.Value.Length >= 13 && a.Value.Length <= 16)
            {
                a.Value = Regex.Replace(a.Value, @"\b\d{13,16}\b", match => 
                    new String('*', match.Value.Length - 4) +
                    match.Value.Substring(match.Value.Length - 4)
                );
        }
        else //If value is not a credit card, replace it with ***
                a.Value = Regex.Replace(a.Value ,@"\b\d+\b", "***");
        }
    }
}

xml = element.ToString();
s1.Stop();

XElement.Parse(xml); takes between 2 - 3 ms.

The LINQ query takes between 0.004 - 0.005 ms.

The foreach statements take between 4 - 5 ms.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

焚却相思 2025-01-09 11:28:06

您似乎正在执行两次搜索和替换:

  1. 将每个 CC 号码替换为 * 和最后 4 位数字
  2. 替换同一元素上的任何其他“CC-ish”号码* 的。

一种方法是让 XL​​inq 为您工作得更困难一些:

// you're not using the elements, ignore them, just get the attributes
foreach (var atr in xelt.Descendants()
                        .Where(e => e.Attributes()
                                     .Any(a => a.Value.Length >= 13
                                            && a.Value.Length <= 16))
                        .SelectMany(e => e.Attributes()))
{
    // static basicDigits = new Regex(@"\b\d+\b", RegexOptions.Compiled);
    // static ccDigits = new Regex(@"\b\d{13,16}\b", RegexOptions.Compiled);
    if (ccDigits.IsMatch(atr.Value))
    {
         atr.Value = ccDigits.Replace(
             atr.Value,
             mm => new String('*', mm.Value.Length - 4)
                   + mm.Value.Substring(mm.Value.Length - 4));
    }
    else
    {
        atr.Value = basicDigits.Replace(atr.Value, "***");
    }
}

// using 150k XML (1k nodes/5k attrs, 3 attr/node avg, avg depth 4 nodes)
// with 10% match rate:
// - 25.7 MB/s (average 100 trials)
// - 61 attributes/ms

示例输入 XML:

<item f1="abc123abc" f2="helloooo 1234567" f3="abc123abc">
     <item f1="abc123abc" f2="helloooo 1234567" f3="abc123abc" real1="4444555566667777" />
     <item f1="abc123abc" f2="helloooo 1234567" f3="abc123abc" />
     ruBTMjSesurMsP6lK2jg
 </item>

输出:

<item f1="abc123abc" f2="helloooo 1234567" f3="abc123abc">
     <item f1="abc123abc" f2="helloooo ***" f3="abc123abc" real1="************7777" />
     <item f1="abc123abc" f2="helloooo 1234567" f3="abc123abc" />
     ruBTMjSesurMsP6lK2jg
</item>

It appears you're doing two search and replacements:

  1. Replace every CC number with *'s and the last 4 digits
  2. Replace any other "CC-ish" number on the same element with *'s.

One approach would be to make XLinq work a little bit harder for you:

// you're not using the elements, ignore them, just get the attributes
foreach (var atr in xelt.Descendants()
                        .Where(e => e.Attributes()
                                     .Any(a => a.Value.Length >= 13
                                            && a.Value.Length <= 16))
                        .SelectMany(e => e.Attributes()))
{
    // static basicDigits = new Regex(@"\b\d+\b", RegexOptions.Compiled);
    // static ccDigits = new Regex(@"\b\d{13,16}\b", RegexOptions.Compiled);
    if (ccDigits.IsMatch(atr.Value))
    {
         atr.Value = ccDigits.Replace(
             atr.Value,
             mm => new String('*', mm.Value.Length - 4)
                   + mm.Value.Substring(mm.Value.Length - 4));
    }
    else
    {
        atr.Value = basicDigits.Replace(atr.Value, "***");
    }
}

// using 150k XML (1k nodes/5k attrs, 3 attr/node avg, avg depth 4 nodes)
// with 10% match rate:
// - 25.7 MB/s (average 100 trials)
// - 61 attributes/ms

Sample input XML:

<item f1="abc123abc" f2="helloooo 1234567" f3="abc123abc">
     <item f1="abc123abc" f2="helloooo 1234567" f3="abc123abc" real1="4444555566667777" />
     <item f1="abc123abc" f2="helloooo 1234567" f3="abc123abc" />
     ruBTMjSesurMsP6lK2jg
 </item>

Output:

<item f1="abc123abc" f2="helloooo 1234567" f3="abc123abc">
     <item f1="abc123abc" f2="helloooo ***" f3="abc123abc" real1="************7777" />
     <item f1="abc123abc" f2="helloooo 1234567" f3="abc123abc" />
     ruBTMjSesurMsP6lK2jg
</item>
倒带 2025-01-09 11:28:06

您可能需要考虑预编译您的正则表达式。此处的文章: http://en.csharp-online.net/CSharp_Regular_Expression_Recipes%E2 %80%94Compiling_Regular_Expressions 解释了编译的优缺点正则表达式非常好。

You may want to consider precompiling your regex. The article here: http://en.csharp-online.net/CSharp_Regular_Expression_Recipes%E2%80%94Compiling_Regular_Expressions explains the pro and cons of compiling regex quite nicely.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文