加速解析算法

发布于 2024-10-21 03:36:18 字数 1964 浏览 5 评论 0原文

我正在尝试解析一些 ddump 文件,你能帮我加快算法速度吗?
每个循环需要 216 毫秒!这实在是太多了。我希望每个循环的时间约为 40-50 毫秒。也许通过使用正则表达式?

这是我的算法:

 while (pos < EntireFile.Length && (/*curr = */EntireFile.Substring(pos, EntireFile.Length - pos)).Contains(" class"))
            {
                w.Reset();
                w.Start();
                pos = EntireFile.ToLower().IndexOf(" class", pos) + 6;
                int end11 = EntireFile.ToLower().IndexOf("extends", pos);
                if (end11 == -1)
                    end11 = EntireFile.IndexOf("\r\n", pos);
                else
                {
                    int end22 = EntireFile.IndexOf("\r\n", pos);
                    if (end22 < end11)
                        end11 = end22;
                }
                //string opcods = EntireFile.Substring(pos, EntireFile.Length - pos);
                string Cname = EntireFile.Substring(pos, end11 - pos).Trim();
                pos += (end11 - pos) + 7;
                pos = EntireFile.IndexOf("{", pos) +1;

int count = 1; string searching = EntireFile.Substring(pos, EntireFile.Length - pos); int searched = 0; while (count != 0) { if (searching[searched] == '{') count++; else if (searching[searched] == '}') count--; searched++; } string Content = EntireFile.Substring(pos, searched); tlist.Add(new TClass() { ClassName = Cname, Content = Content }); pos += searched; if (pos % 3 == 0) { double prc = ((double)pos) * 100d / ((double)EntireFile.Length); int prcc = (int)Math.Round(prc); wnd.UpdateStatus(prcc); wnd.Update(); } mils.Add((int)w.ElapsedMilliseconds); }

任何帮助将不胜感激。

I'm trying to parse some ddump files, could you please help me speed up my algorithm?
It takes 216 ms for each loop!! that is way too much. I would like to have it around 40-50 ms per loop. Maybe by using RegExp?

Here is my algrithm:

 while (pos < EntireFile.Length && (/*curr = */EntireFile.Substring(pos, EntireFile.Length - pos)).Contains(" class"))
            {
                w.Reset();
                w.Start();
                pos = EntireFile.ToLower().IndexOf(" class", pos) + 6;
                int end11 = EntireFile.ToLower().IndexOf("extends", pos);
                if (end11 == -1)
                    end11 = EntireFile.IndexOf("\r\n", pos);
                else
                {
                    int end22 = EntireFile.IndexOf("\r\n", pos);
                    if (end22 < end11)
                        end11 = end22;
                }
                //string opcods = EntireFile.Substring(pos, EntireFile.Length - pos);
                string Cname = EntireFile.Substring(pos, end11 - pos).Trim();
                pos += (end11 - pos) + 7;
                pos = EntireFile.IndexOf("{", pos) +1;

int count = 1; string searching = EntireFile.Substring(pos, EntireFile.Length - pos); int searched = 0; while (count != 0) { if (searching[searched] == '{') count++; else if (searching[searched] == '}') count--; searched++; } string Content = EntireFile.Substring(pos, searched); tlist.Add(new TClass() { ClassName = Cname, Content = Content }); pos += searched; if (pos % 3 == 0) { double prc = ((double)pos) * 100d / ((double)EntireFile.Length); int prcc = (int)Math.Round(prc); wnd.UpdateStatus(prcc); wnd.Update(); } mils.Add((int)w.ElapsedMilliseconds); }

Any help would be greatly appreciated.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

梨涡少年 2024-10-28 03:36:19

好吧,多次这样做

EntireFile.ToLower()

肯定没有帮助。您可以执行以下操作:

  1. 仅执行一次成本高昂的操作(ToLowerIndexOf 等),并在可能的情况下缓存结果。
  2. 不要缩小您正在使用 SubString 处理的输入范围,这会降低您的性能。相反,保留一个单独的 int parseStart 值并将其用作所有 IndexOf 调用的附加参数。换句话说,跟踪您手动解析的文件部分,而不是每次都采用较小的子字符串。

Well, doing this multiple times

EntireFile.ToLower()

certainly will not help. There are several things you can do:

  1. Perform costly operations (ToLower, IndexOf, etc) only once and cache the results if possible.
  2. Do not narrow down on the input you are processing with SubString, this will kill your performance. Rather, keep a separate int parseStart value and use that as an additional parameter to all of your IndexOf calls. In other words, keep track of the part of the file you have parsed manually instead of taking a smaller substring each time.
不如归去 2024-10-28 03:36:19

您遇到的性能问题在很大程度上与所有字符串复制操作的开销有关。

如果您通过简单地使用索引对整个字符串进行虚拟子串来消除复制,那么可以使用重载来指定字符串操作的有效范围,这将产生影响。

此外,不区分大小写的比较不是通过减小或增大字符串来进行的!您可以使用 StringComparer 类或 StringComparsion 枚举。有许多字符串重载可让您指定是否考虑区分大小写。

使用方括号表示法重复索引字符串也非常昂贵。如果您查看 .NET 中字符串操作的实现,它们总是将搜索字符串转换为字符数组,因为这样处理起来更快。然而,这意味着即使对于只读搜索操作,仍然会发生大量复制。

The performance problems you have are in large related to overhead from all the string copy operations.

There are overloads that let's you specify the valid range of your string operations if you eliminate the copying by simply using an index to virtually substring the entire string that will make a difference.

Also, case-insensitive comparison are not made by lowering or upping the string! You use the StringComparer class or StringComparsion enumeration. There are many string overloads that let's you specify whether to consider case-sensitivity.

Indexing a string repeatedly using the square bracket notation is also very expensive. If you look at the implementation of the string operations in .NET they always turn the search string into a char array because that's faster to work with. However, that means that a lot of copying is still taking place even for read only search operations.

奢望 2024-10-28 03:36:19

我建议使用分析工具将减慢速度的代码部分归零。

JetBrains dotTrace 是一款分析产品,对此类任务有很大帮助。

I'd recommend using a profiling tool to zero in on the part of your code that is slowing you down.

JetBrains dotTrace is one profiling product that has helped immensely with this kind of a task.

单身狗的梦 2024-10-28 03:36:19

除了 Jon 的答案之外,据我了解,代码的 while () 部分中的任何内容都将在每个循环上执行。因此,您可能会更快地找到一种方法,以免

EntireFile.Substring(pos, EntireFile.Length - pos)).Contains(" class")

在 while 循环的每次迭代中重新计算。此外,您到底想解析什么?它是一个普通的文本文件吗?你没有提供很多细节。我喜欢用来解析文本文件的一种方法是使用“\n”作为分隔符将整个文件加载到字符串数组中。然后我可以快速单步遍历数组并解析内容。如果需要,我可以存储数组索引并快速引用前一行。

In addition to the answer from Jon, as I understand it, anything in your while () portion of your code will execute on each loop. So it may be faster for you to figure out a way to not have it recalculate

EntireFile.Substring(pos, EntireFile.Length - pos)).Contains(" class")

on each iteration of the while loop. Additionally, what exactly are you trying to parse? Is it a normal text file? You haven't given many details. One method I like to use to parse text files is to load the entire file into an array of strings using '\n' as a delimiter. Then I can quickly step through the array and parse the contents. If I need to, I can store an array index and quickly refer to a previous line.

星軌x 2024-10-28 03:36:19

首先,您可以更改

while (pos < EntireFile.Length && (/*curr = */EntireFile.Substring(pos, EntireFile.Length - pos)).Contains(" class"))
{
 ...
}

var loweredEntireFile = EntireFile.ToLower();

while (pos < loweredEntireFile.Length && 
       Regex.IsMatch(loweredEntireFile, " class",   
       RegexOptions.IgnoreCase)
{
...

    // we just need to process the rest of the file
    loweredEntireFile = loweredEntireFile.Substring(pos, loweredEntireFile.Length - pos));
}

然后按照其他建议更改

pos = EntireFile.ToLower().IndexOf(" class", pos) + 6;
int end11 = EntireFile.ToLower().IndexOf("extends", pos);

var matches = Regex.Matchs(loweredEntireFile, " class", RegexOptions.IgnoreCase);
pos = matches.First().Index;

matches = Regex.Matchs(loweredEntireFile, "extends", RegexOptions.IgnoreCase);
var end11 = matches.First().Index;

var loweredEntiredFile = EntiredFile.ToLower();

应该在 while 之外完成一次,并且

loweredEntireFile = loweredEntireFile.Substring(pos, loweredEntireFile.Length - pos));

需要在 while 结束时完成

firstly, you can change

while (pos < EntireFile.Length && (/*curr = */EntireFile.Substring(pos, EntireFile.Length - pos)).Contains(" class"))
{
 ...
}

to

var loweredEntireFile = EntireFile.ToLower();

while (pos < loweredEntireFile.Length && 
       Regex.IsMatch(loweredEntireFile, " class",   
       RegexOptions.IgnoreCase)
{
...

    // we just need to process the rest of the file
    loweredEntireFile = loweredEntireFile.Substring(pos, loweredEntireFile.Length - pos));
}

then change

pos = EntireFile.ToLower().IndexOf(" class", pos) + 6;
int end11 = EntireFile.ToLower().IndexOf("extends", pos);

to

var matches = Regex.Matchs(loweredEntireFile, " class", RegexOptions.IgnoreCase);
pos = matches.First().Index;

matches = Regex.Matchs(loweredEntireFile, "extends", RegexOptions.IgnoreCase);
var end11 = matches.First().Index;

as other suggested,

var loweredEntiredFile = EntiredFile.ToLower();

should be done once outside the while, and

loweredEntireFile = loweredEntireFile.Substring(pos, loweredEntireFile.Length - pos));

need to be done in the end of the while

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文