如何在C#中尽可能快地将字符串分为一系列字符串?

发布于 2025-02-12 21:09:32 字数 2137 浏览 1 评论 0原文

简短描述:将字符串拆分到长时间。

更长的描述: 我需要从字符串中提取信息看起来像这样:(

...
5   1   12  1   1   1   466 1277    458 80  92  Assistance
2   1   13  0   0   0   1055    1277    1717    100 -1  
3   1   13  1   0   0   1055    1186    1717    191 -1  
4   1   13  1   1   0   1055    1277    1717    100 -1  
5   1   13  1   1   1   1055    1279    288 78  90  Vehicle
5   1   13  1   1   2   1489    1279    228 98  67  Lights
5   1   13  1   1   3   1856    1281    286 95  74  System
5   1   13  1   1   4   2284    1281    196 95  70  Apps
5   1   13  1   1   5   2618    1277    154 80  77  Info
...

旁注:字符串是作为 page.gettsvtext(0) method的返回而来的。 tesseractengine.process(image); 注:

为了能够更轻松地使用信息,我编写了一种将字符串变成一系列字符串数组:

private string[][] getDataArray(string source)
        {
            Stopwatch sw = new Stopwatch();
            sw.Start();

            Console.WriteLine(source);

            string[] rows = source.Split(new char[] { '\n' }, StringSplitOptions.RemoveEmptyEntries);
            int nrOfRows = rows.Length;
            string[][] result = new string[nrOfRows][];

            for (int i = 0; i < nrOfRows; i++)
            {
                result[i] = rows[i].Split(new char[] { ' ', '   ' }, StringSplitOptions.RemoveEmptyEntries);
            }
            sw.Stop();
            Console.WriteLine(" $$$ getDataArray() took: " + sw.ElapsedMilliseconds + " ms");
            return result;
        }

注:注: 由于某种原因,字符串包含的空间看起来比通常的空间更长。我从控制台日志中拿出了复制纸。它是一个字符,而不是一个选项卡,但比通常的空间char所需的空间/更宽。

问题:

  • 当我从内部测量时间该方法时,的时间小于1 ms
  • 当我从外部测量时间时,如下:
stopwatch.Restart();


// Get data
string[][] data = getDataArray(page.GetTsvText(0));

stopwatch.Stop();
Console.WriteLine(" $$$ $$$ Got data array in: " + stopwatch.ElapsedMilliseconds + " ms");

它需要 2000 ms

字符串初始化需要这么长的时间如何更快地获得它,例如50 ms

Short description: Splitting a string takes way to long.

Longer description:
I need to extract information from a string looking like this:

...
5   1   12  1   1   1   466 1277    458 80  92  Assistance
2   1   13  0   0   0   1055    1277    1717    100 -1  
3   1   13  1   0   0   1055    1186    1717    191 -1  
4   1   13  1   1   0   1055    1277    1717    100 -1  
5   1   13  1   1   1   1055    1279    288 78  90  Vehicle
5   1   13  1   1   2   1489    1279    228 98  67  Lights
5   1   13  1   1   3   1856    1281    286 95  74  System
5   1   13  1   1   4   2284    1281    196 95  70  Apps
5   1   13  1   1   5   2618    1277    154 80  77  Info
...

(Side Note: the string comes as a return from the page.GetTsvText(0) method; page is a return of TesseractEngine.Process(image); so the string contains information about detected OCR strings, conficendes, bounding boxes coords, etc.)

In order to be able to make easier use of the information, I wrote a method that turns the string into a array of arrays of strings:

private string[][] getDataArray(string source)
        {
            Stopwatch sw = new Stopwatch();
            sw.Start();

            Console.WriteLine(source);

            string[] rows = source.Split(new char[] { '\n' }, StringSplitOptions.RemoveEmptyEntries);
            int nrOfRows = rows.Length;
            string[][] result = new string[nrOfRows][];

            for (int i = 0; i < nrOfRows; i++)
            {
                result[i] = rows[i].Split(new char[] { ' ', '   ' }, StringSplitOptions.RemoveEmptyEntries);
            }
            sw.Stop();
            Console.WriteLine(" $$ getDataArray() took: " + sw.ElapsedMilliseconds + " ms");
            return result;
        }

Note: For some reason the string contains spaces that look longer than the usual spaces. I took it with copy-paste from the console log. It is a single character, not a tab, but it takes more space/ is wider than the usual space char.

Problem:

  • When I measure the time from inside the method, it takes less than 1 ms.
  • When I measure the time from outside, like this:
stopwatch.Restart();


// Get data
string[][] data = getDataArray(page.GetTsvText(0));

stopwatch.Stop();
Console.WriteLine(" $$ $$ Got data array in: " + stopwatch.ElapsedMilliseconds + " ms");

it takes about 2000 ms.

Does the string initialisation take so long? How can I get it faster, like under 50 ms?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

软糯酥胸 2025-02-19 21:09:32

通过使用Linq

string[][] result = source.Split('\n')
                          .Select(line => line.Split(new char[] { ' ', '   ' }, StringSplitOptions.RemoveEmptyEntries));

Linq 具有更好的性能和更快的结果。

By using Linq

string[][] result = source.Split('\n')
                          .Select(line => line.Split(new char[] { ' ', '   ' }, StringSplitOptions.RemoveEmptyEntries));

linq has better performance and a faster result.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文