C# 使用常量加速解析器?虽然是抽象类
我有一系列解析器,它们解析相同基本类型的文本以获取相关数据,但它们来自不同的来源,因此它们存在微妙的差异。我每天要解析数百万个文档,因此任何速度优化都会有所帮助。
这是一个显示基本问题的简化示例。解析器的设置使得有一个实际解析器实现的基本抽象解析器:
abstract class BaseParser
{
protected abstract string SomeRegex { get; }
public string ParseSomethingCool(string text)
{
return Regex.Match(text, SomeRegex).Value;
}
....
}
class Parser1: BaseParser
{
protected override string SomeRegex { get { return "^.*"; } } // example regex
...
}
class Parser2: BaseParser
{
protected override string SomeRegex { get { return "^[0-9]+"; } } // example regex
...
}
所以我的问题是:
- 如果我要在 get 常量中返回内容,它会加快速度吗?
- 从理论上讲,如果它不使用属性并且一切都是直接常数,那么速度会更快吗?
- 我能看到什么样的速度增加?
- 我只是抓住了救命稻草吗?
I have a series of parser which parse the same basic sort of text for relevant data but they come from various sources so they differ subtltey. I am parsing millions of documents per day so any speed optimizations help.
Here is a simplified example to show the fundamental issue. The parser is set up such that there is a base abstract parser that actual parsers implement:
abstract class BaseParser
{
protected abstract string SomeRegex { get; }
public string ParseSomethingCool(string text)
{
return Regex.Match(text, SomeRegex).Value;
}
....
}
class Parser1: BaseParser
{
protected override string SomeRegex { get { return "^.*"; } } // example regex
...
}
class Parser2: BaseParser
{
protected override string SomeRegex { get { return "^[0-9]+"; } } // example regex
...
}
So my questions are:
- If I were to make the things returned in the get constants would it speed things up?
- Theoretically if it didn't use a property and everything was a straight up constant would that speed things up more?
- What sort of speed increases if any could I see?
- Am I just clutching at straws?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
我认为将属性转换为常量不会给您带来任何明显的性能提升。无论如何,Jit'ed 代码可能已经内联了这些内容(因为您放入了常量)。
我认为最好的方法是首先分析你的代码,看看哪些部分最有优化潜力。我对要查看的内容的建议:
无论你最终做什么,衡量总是有很大帮助。确定有机会的领域,找到更快的方法,然后再次测量以验证它是否确实更快。
I don't think converting the properties to constants will give you any appreciable performance boost. The Jit'ed code probably have those inlined anyway (since you put in constants).
I think the best approach is profiling your code first and see which parts have the most potential of optimization. My suggestion of things to look at:
Whatever you end up doing, its always a big help to measure. Identify the areas with opportunity, find a faster way to do it then measure again to verify if it is indeed faster.
get中的东西已经是不变的了。
我敢打赌,抖动已经优化了属性访问器,因此您可能不会通过重构它们来看到太多性能提升。
The things in the get already are constant.
I bet the jitter is already optimizing away the property accessors, so you probably won't see much performance gain by refactoring them out.
我认为这种优化不会带来明显的速度提升。不过,最好的选择是尝试一下并对结果进行基准测试。
一项会产生影响的改变是,如果你可以不使用正则表达式,就不要使用它。 Regex 是一个相当大且有用的锤子,但并不是每个钉子都需要那么大的锤子。
I don't think you'd see appreciable speed improvements from this kind of optimsation. Your best bet, though, is to try it and benchmark the results.
One change that would make a difference is to not use Regex if you can get away without it. Regex is a pretty big and useful hammer, but not every nail needs a hammer that big.
从代码中您不清楚为什么需要抽象类并继承。
使用虚拟成员速度较慢。此外,您的孩子的班级不是密封的。
为什么不做这样的事情:
或这样的
但是,我认为如果使用多线程,您将获得最大的性能提升。也许你已经这样做了。如果您不看一下任务并行库
From the code you show not clear why you need an abstract class and inheriting.
Using virtual members is slower. Moreover, your child classes aren't sealed.
Why don't you do something like this:
or like this
However, I think the greatest gain in performance you would achieve if you use multi-threading. Probably you already do. If you don't take a look at Task Parallel Library