加速解析算法
我正在尝试解析一些 ddump 文件,你能帮我加快算法速度吗?
每个循环需要 216 毫秒!这实在是太多了。我希望每个循环的时间约为 40-50 毫秒。也许通过使用正则表达式?
这是我的算法:
while (pos < EntireFile.Length && (/*curr = */EntireFile.Substring(pos, EntireFile.Length - pos)).Contains(" class"))
{
w.Reset();
w.Start();
pos = EntireFile.ToLower().IndexOf(" class", pos) + 6;
int end11 = EntireFile.ToLower().IndexOf("extends", pos);
if (end11 == -1)
end11 = EntireFile.IndexOf("\r\n", pos);
else
{
int end22 = EntireFile.IndexOf("\r\n", pos);
if (end22 < end11)
end11 = end22;
}
//string opcods = EntireFile.Substring(pos, EntireFile.Length - pos);
string Cname = EntireFile.Substring(pos, end11 - pos).Trim();
pos += (end11 - pos) + 7;
pos = EntireFile.IndexOf("{", pos) +1;
int count = 1;
string searching = EntireFile.Substring(pos, EntireFile.Length - pos);
int searched = 0;
while (count != 0)
{
if (searching[searched] == '{')
count++;
else if (searching[searched] == '}')
count--;
searched++;
}
string Content = EntireFile.Substring(pos, searched);
tlist.Add(new TClass() { ClassName = Cname, Content = Content });
pos += searched;
if (pos % 3 == 0)
{
double prc = ((double)pos) * 100d / ((double)EntireFile.Length);
int prcc = (int)Math.Round(prc);
wnd.UpdateStatus(prcc);
wnd.Update();
}
mils.Add((int)w.ElapsedMilliseconds);
}
任何帮助将不胜感激。
I'm trying to parse some ddump files, could you please help me speed up my algorithm?
It takes 216 ms for each loop!! that is way too much. I would like to have it around 40-50 ms per loop. Maybe by using RegExp?
Here is my algrithm:
while (pos < EntireFile.Length && (/*curr = */EntireFile.Substring(pos, EntireFile.Length - pos)).Contains(" class")) { w.Reset(); w.Start(); pos = EntireFile.ToLower().IndexOf(" class", pos) + 6; int end11 = EntireFile.ToLower().IndexOf("extends", pos); if (end11 == -1) end11 = EntireFile.IndexOf("\r\n", pos); else { int end22 = EntireFile.IndexOf("\r\n", pos); if (end22 < end11) end11 = end22; } //string opcods = EntireFile.Substring(pos, EntireFile.Length - pos); string Cname = EntireFile.Substring(pos, end11 - pos).Trim(); pos += (end11 - pos) + 7; pos = EntireFile.IndexOf("{", pos) +1;
int count = 1; string searching = EntireFile.Substring(pos, EntireFile.Length - pos); int searched = 0; while (count != 0) { if (searching[searched] == '{') count++; else if (searching[searched] == '}') count--; searched++; } string Content = EntireFile.Substring(pos, searched); tlist.Add(new TClass() { ClassName = Cname, Content = Content }); pos += searched; if (pos % 3 == 0) { double prc = ((double)pos) * 100d / ((double)EntireFile.Length); int prcc = (int)Math.Round(prc); wnd.UpdateStatus(prcc); wnd.Update(); } mils.Add((int)w.ElapsedMilliseconds); }
Any help would be greatly appreciated.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
好吧,多次这样做
肯定没有帮助。您可以执行以下操作:
ToLower
、IndexOf
等),并在可能的情况下缓存结果。SubString
处理的输入范围,这会降低您的性能。相反,保留一个单独的int parseStart
值并将其用作所有IndexOf
调用的附加参数。换句话说,跟踪您手动解析的文件部分,而不是每次都采用较小的子字符串。Well, doing this multiple times
certainly will not help. There are several things you can do:
ToLower
,IndexOf
, etc) only once and cache the results if possible.SubString
, this will kill your performance. Rather, keep a separateint parseStart
value and use that as an additional parameter to all of yourIndexOf
calls. In other words, keep track of the part of the file you have parsed manually instead of taking a smaller substring each time.您遇到的性能问题在很大程度上与所有字符串复制操作的开销有关。
如果您通过简单地使用索引对整个字符串进行虚拟子串来消除复制,那么可以使用重载来指定字符串操作的有效范围,这将产生影响。
此外,不区分大小写的比较不是通过减小或增大字符串来进行的!您可以使用
StringComparer
类或StringComparsion
枚举。有许多字符串重载可让您指定是否考虑区分大小写。使用方括号表示法重复索引字符串也非常昂贵。如果您查看 .NET 中字符串操作的实现,它们总是将搜索字符串转换为字符数组,因为这样处理起来更快。然而,这意味着即使对于只读搜索操作,仍然会发生大量复制。
The performance problems you have are in large related to overhead from all the string copy operations.
There are overloads that let's you specify the valid range of your string operations if you eliminate the copying by simply using an index to virtually substring the entire string that will make a difference.
Also, case-insensitive comparison are not made by lowering or upping the string! You use the
StringComparer
class orStringComparsion
enumeration. There are many string overloads that let's you specify whether to consider case-sensitivity.Indexing a string repeatedly using the square bracket notation is also very expensive. If you look at the implementation of the string operations in .NET they always turn the search string into a char array because that's faster to work with. However, that means that a lot of copying is still taking place even for read only search operations.
我建议使用分析工具将减慢速度的代码部分归零。
JetBrains dotTrace 是一款分析产品,对此类任务有很大帮助。
I'd recommend using a profiling tool to zero in on the part of your code that is slowing you down.
JetBrains dotTrace is one profiling product that has helped immensely with this kind of a task.
除了 Jon 的答案之外,据我了解,代码的 while () 部分中的任何内容都将在每个循环上执行。因此,您可能会更快地找到一种方法,以免
在 while 循环的每次迭代中重新计算。此外,您到底想解析什么?它是一个普通的文本文件吗?你没有提供很多细节。我喜欢用来解析文本文件的一种方法是使用“\n”作为分隔符将整个文件加载到字符串数组中。然后我可以快速单步遍历数组并解析内容。如果需要,我可以存储数组索引并快速引用前一行。
In addition to the answer from Jon, as I understand it, anything in your while () portion of your code will execute on each loop. So it may be faster for you to figure out a way to not have it recalculate
on each iteration of the while loop. Additionally, what exactly are you trying to parse? Is it a normal text file? You haven't given many details. One method I like to use to parse text files is to load the entire file into an array of strings using '\n' as a delimiter. Then I can quickly step through the array and parse the contents. If I need to, I can store an array index and quickly refer to a previous line.
首先,您可以更改
为
然后按照其他建议更改
为
,
应该在 while 之外完成一次,并且
需要在 while 结束时完成
firstly, you can change
to
then change
to
as other suggested,
should be done once outside the while, and
need to be done in the end of the while