如何在C#中尽可能快地将字符串分为一系列字符串?
简短描述:将字符串拆分到长时间。
更长的描述: 我需要从字符串中提取信息看起来像这样:(
...
5 1 12 1 1 1 466 1277 458 80 92 Assistance
2 1 13 0 0 0 1055 1277 1717 100 -1
3 1 13 1 0 0 1055 1186 1717 191 -1
4 1 13 1 1 0 1055 1277 1717 100 -1
5 1 13 1 1 1 1055 1279 288 78 90 Vehicle
5 1 13 1 1 2 1489 1279 228 98 67 Lights
5 1 13 1 1 3 1856 1281 286 95 74 System
5 1 13 1 1 4 2284 1281 196 95 70 Apps
5 1 13 1 1 5 2618 1277 154 80 77 Info
...
旁注:字符串是作为 page.gettsvtext(0) method的返回而来的。 tesseractengine.process(image); 注:
为了能够更轻松地使用信息,我编写了一种将字符串变成一系列字符串数组:
private string[][] getDataArray(string source)
{
Stopwatch sw = new Stopwatch();
sw.Start();
Console.WriteLine(source);
string[] rows = source.Split(new char[] { '\n' }, StringSplitOptions.RemoveEmptyEntries);
int nrOfRows = rows.Length;
string[][] result = new string[nrOfRows][];
for (int i = 0; i < nrOfRows; i++)
{
result[i] = rows[i].Split(new char[] { ' ', ' ' }, StringSplitOptions.RemoveEmptyEntries);
}
sw.Stop();
Console.WriteLine(" $$$ getDataArray() took: " + sw.ElapsedMilliseconds + " ms");
return result;
}
注:注: 由于某种原因,字符串包含的空间看起来比通常的空间更长。我从控制台日志中拿出了复制纸。它是一个字符,而不是一个选项卡,但比通常的空间char所需的空间/更宽。
问题:
- 当我从内部测量时间该方法时,的时间小于1 ms 。
- 当我从外部测量时间时,如下:
stopwatch.Restart();
// Get data
string[][] data = getDataArray(page.GetTsvText(0));
stopwatch.Stop();
Console.WriteLine(" $$$ $$$ Got data array in: " + stopwatch.ElapsedMilliseconds + " ms");
它需要 2000 ms 。
字符串初始化需要这么长的时间? 如何更快地获得它,例如50 ms ?
Short description: Splitting a string takes way to long.
Longer description:
I need to extract information from a string looking like this:
...
5 1 12 1 1 1 466 1277 458 80 92 Assistance
2 1 13 0 0 0 1055 1277 1717 100 -1
3 1 13 1 0 0 1055 1186 1717 191 -1
4 1 13 1 1 0 1055 1277 1717 100 -1
5 1 13 1 1 1 1055 1279 288 78 90 Vehicle
5 1 13 1 1 2 1489 1279 228 98 67 Lights
5 1 13 1 1 3 1856 1281 286 95 74 System
5 1 13 1 1 4 2284 1281 196 95 70 Apps
5 1 13 1 1 5 2618 1277 154 80 77 Info
...
(Side Note: the string comes as a return from the page.GetTsvText(0) method; page is a return of TesseractEngine.Process(image); so the string contains information about detected OCR strings, conficendes, bounding boxes coords, etc.)
In order to be able to make easier use of the information, I wrote a method that turns the string into a array of arrays of strings:
private string[][] getDataArray(string source)
{
Stopwatch sw = new Stopwatch();
sw.Start();
Console.WriteLine(source);
string[] rows = source.Split(new char[] { '\n' }, StringSplitOptions.RemoveEmptyEntries);
int nrOfRows = rows.Length;
string[][] result = new string[nrOfRows][];
for (int i = 0; i < nrOfRows; i++)
{
result[i] = rows[i].Split(new char[] { ' ', ' ' }, StringSplitOptions.RemoveEmptyEntries);
}
sw.Stop();
Console.WriteLine(" $$ getDataArray() took: " + sw.ElapsedMilliseconds + " ms");
return result;
}
Note: For some reason the string contains spaces that look longer than the usual spaces. I took it with copy-paste from the console log. It is a single character, not a tab, but it takes more space/ is wider than the usual space char.
Problem:
- When I measure the time from inside the method, it takes less than 1 ms.
- When I measure the time from outside, like this:
stopwatch.Restart();
// Get data
string[][] data = getDataArray(page.GetTsvText(0));
stopwatch.Stop();
Console.WriteLine(" $$ $$ Got data array in: " + stopwatch.ElapsedMilliseconds + " ms");
it takes about 2000 ms.
Does the string initialisation take so long? How can I get it faster, like under 50 ms?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
通过使用Linq
Linq 具有更好的性能和更快的结果。
By using Linq
linq has better performance and a faster result.