C# 使用常量加速解析器?虽然是抽象类

发布于 2024-11-08 06:56:48 字数 790 浏览 3 评论 0原文

我有一系列解析器,它们解析相同基本类型的文本以获取相关数据,但它们来自不同的来源,因此它们存在微妙的差异。我每天要解析数百万个文档,因此任何速度优化都会有所帮助。

这是一个显示基本问题的简化示例。解析器的设置使得有一个实际解析器实现的基本抽象解析器:

abstract class BaseParser
{
     protected abstract string SomeRegex { get; }

     public string ParseSomethingCool(string text)
     {
         return Regex.Match(text, SomeRegex).Value;
     }

     ....
 }

 class Parser1: BaseParser
 {
     protected override string SomeRegex { get { return "^.*"; } } // example regex

     ...
 }

 class Parser2: BaseParser
 {
     protected override string SomeRegex { get { return "^[0-9]+"; } } // example regex

     ...
 }

所以我的问题是:

  • 如果我要在 get 常量中返回内容,它会加快速度吗?
  • 从理论上讲,如果它不使用属性并且一切都是直接常数,那么速度会更快吗?
  • 我能看到什么样的速度增加?
  • 我只是抓住了救命稻草吗?

I have a series of parser which parse the same basic sort of text for relevant data but they come from various sources so they differ subtltey. I am parsing millions of documents per day so any speed optimizations help.

Here is a simplified example to show the fundamental issue. The parser is set up such that there is a base abstract parser that actual parsers implement:

abstract class BaseParser
{
     protected abstract string SomeRegex { get; }

     public string ParseSomethingCool(string text)
     {
         return Regex.Match(text, SomeRegex).Value;
     }

     ....
 }

 class Parser1: BaseParser
 {
     protected override string SomeRegex { get { return "^.*"; } } // example regex

     ...
 }

 class Parser2: BaseParser
 {
     protected override string SomeRegex { get { return "^[0-9]+"; } } // example regex

     ...
 }

So my questions are:

  • If I were to make the things returned in the get constants would it speed things up?
  • Theoretically if it didn't use a property and everything was a straight up constant would that speed things up more?
  • What sort of speed increases if any could I see?
  • Am I just clutching at straws?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

后知后觉 2024-11-15 06:56:48

我认为将属性转换为常量不会给您带来任何明显的性能提升。无论如何,Jit'ed 代码可能已经内联了这些内容(因为您放入了常量)。

我认为最好的方法是首先分析你的代码,看看哪些部分最有优化潜力。我对要查看的内容的建议:

  1. 正则表达式 - 正如您所知,有时,构造良好的正则表达式可以阐明快速和极慢之间的区别。这实际上是根据具体情况而定,具体取决于所使用的表达方式和您提供的文本。
  2. 替代方案 - 我不确定您执行哪种匹配,但可能值得考虑其他方法,特别是如果您尝试匹配的内容不是那么复杂。然后对结果进行基准测试。
  3. 代码的其他部分 - 查看瓶颈出现的位置。是在磁盘IO中,还是在CPU中?看看更多的线程是否会有所帮助,或者可能会重新访问读取文件内容的函数。

无论你最终做什么,衡量总是有很大帮助。确定有机会的领域,找到更快的方法,然后再次测量以验证它是否确实更快。

I don't think converting the properties to constants will give you any appreciable performance boost. The Jit'ed code probably have those inlined anyway (since you put in constants).

I think the best approach is profiling your code first and see which parts have the most potential of optimization. My suggestion of things to look at:

  1. RegEx - as you already know, sometimes, a well constructed RegEx expression spells out the difference between fast and extremely slow. Its really a case to case basis, depending on the expression used and the text you feed it.
  2. Alternatives - I'm not sure what kind of matching you perform, but it might be worth considering other approaches especially if what you are trying to match is not that complex. Then benchmark the results.
  3. Other parts of your code - see where the bottle neck occurs. Is it in disk IO, or CPU? See if more threads will help or maybe revisit the function the reads the file contents.

Whatever you end up doing, its always a big help to measure. Identify the areas with opportunity, find a faster way to do it then measure again to verify if it is indeed faster.

风流物 2024-11-15 06:56:48

get中的东西已经是不变的了。

我敢打赌,抖动已经优化了属性访问器,因此您可能不会通过重构它们来看到太多性能提升。

The things in the get already are constant.

I bet the jitter is already optimizing away the property accessors, so you probably won't see much performance gain by refactoring them out.

佞臣 2024-11-15 06:56:48

我认为这种优化不会带来明显的速度提升。不过,最好的选择是尝试一下并对结果进行基准测试。

一项会产生影响的改变是,如果你可以不使用正则表达式,就不要使用它。 Regex 是一个相当大且有用的锤子,但并不是每个钉子都需要那么大的锤子。

I don't think you'd see appreciable speed improvements from this kind of optimsation. Your best bet, though, is to try it and benchmark the results.

One change that would make a difference is to not use Regex if you can get away without it. Regex is a pretty big and useful hammer, but not every nail needs a hammer that big.

你是暖光i 2024-11-15 06:56:48

从代码中您不清楚为什么需要抽象类并继承。
使用虚拟成员速度较慢。此外,您的孩子的班级不是密封的。

为什么不做这样的事情:

public class Parser
{
    private Regex regex;

    public Parser(string someRegex)
    {
        regex = new Regex(someRegex, RegexOptions.Compiled);
    }

    public string ParseSomethingCool(string text)
    {
        return regex.Match(text).Value;
    }
}

或这样的

public static class Parser
{
    public static string ParseSomethingCool(string text, string someRegex)
    {
        return Regex.Match(text, someRegex).Value;
    }
}

但是,我认为如果使用多线程,您将获得最大的性能提升。也许你已经这样做了。如果您不看一下任务并行库

From the code you show not clear why you need an abstract class and inheriting.
Using virtual members is slower. Moreover, your child classes aren't sealed.

Why don't you do something like this:

public class Parser
{
    private Regex regex;

    public Parser(string someRegex)
    {
        regex = new Regex(someRegex, RegexOptions.Compiled);
    }

    public string ParseSomethingCool(string text)
    {
        return regex.Match(text).Value;
    }
}

or like this

public static class Parser
{
    public static string ParseSomethingCool(string text, string someRegex)
    {
        return Regex.Match(text, someRegex).Value;
    }
}

However, I think the greatest gain in performance you would achieve if you use multi-threading. Probably you already do. If you don't take a look at Task Parallel Library

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文