从文件中获取可变子字符串长度的最快方法 (C#)
我有一个文本文件,其中包含需要提取的值,并且每个值都是可变长度。每个变量的长度存储在 List
中,如果有更有效的方法,则可以更改。
问题:在给定长度的 List
的情况下,将可变长度子字符串放入 DataTable
的最快方法是什么?
示例文本文件内容:
Field1ValueField2ValueIsLongerField3Field4IsExtremelyLongField5IsProbablyTheLongestFieldOfThemAll
A1201605172B160349150816431572C16584D31601346427946121346E674306102966595346438476174959205395664
示例 List
:
11, 19, 6, 21, 40
示例输出 DataTable
:
Field 1 | Field 2 | Field 3 | Field 4 | Field 5 |
---|---|---|---|---|
Field1Value | Field2ValueIsLonger | Field3 | Field4IsExtremelyLong | Field5IsProbicallyTheLongestFieldOfThemAll |
A1201605172 | B160349150816431572 | C16584 | D31601346427946121346 | E674306102966595346438476174959205395664 |
字段值没有模式,可以是任何字母数字值,只能通过长度列表获取字段值。
我的方法如下:
List<int> lengths = new() { 11, 19, 6, 21, 40};
DataTable dataTable = new();
//Add Columns for each field
foreach (int i in lengths)
{
dataTable.Columns.Add();
}
//Read file and get fields
using (StreamReader streamReader = new(fileName))
{
string line; //temp
while ((line = streamReader.ReadLine()) != null)
{
//Create new row each time we see a new line in the text file
DataRow dataRow = dataTable.NewRow();
//Temp counter for starting index of substring
int tempCounter = 0;
//Enumerate through variable lengths
foreach (int i in lengths)
{
//Set the value for tat cell
dataRow[lengths.IndexOf(i)] = line.Substring(tempCounter, i);
//Add the length of the current field
tempCounter += i;
}
//Add Row to DataTable
dataTable.Rows.Add(dataRow);
}
}
是否有更有效(时间和/或内存)的方式来完成这项任务?
I have a text file that has values that need to be extracted and each value is a variable length. The length of each variable is stored in a List<int>
, this can change if there is a more efficient way.
The Problem: What is the fastest way to get the variable length substrings into a DataTable
given a List<int>
of lengths?
Example text file contents:
Field1ValueField2ValueIsLongerField3Field4IsExtremelyLongField5IsProbablyTheLongestFieldOfThemAll
A1201605172B160349150816431572C16584D31601346427946121346E674306102966595346438476174959205395664
Example List<int>
:
11, 19, 6, 21, 40
Example output DataTable
:
Field 1 | Field 2 | Field 3 | Field 4 | Field 5 |
---|---|---|---|---|
Field1Value | Field2ValueIsLonger | Field3 | Field4IsExtremelyLong | Field5IsProbablyTheLongestFieldOfThemAll |
A1201605172 | B160349150816431572 | C16584 | D31601346427946121346 | E674306102966595346438476174959205395664 |
There is no pattern to the field values, could be any alphanumeric value, and can only get the field values via the length list.
My approach was as follows:
List<int> lengths = new() { 11, 19, 6, 21, 40};
DataTable dataTable = new();
//Add Columns for each field
foreach (int i in lengths)
{
dataTable.Columns.Add();
}
//Read file and get fields
using (StreamReader streamReader = new(fileName))
{
string line; //temp
while ((line = streamReader.ReadLine()) != null)
{
//Create new row each time we see a new line in the text file
DataRow dataRow = dataTable.NewRow();
//Temp counter for starting index of substring
int tempCounter = 0;
//Enumerate through variable lengths
foreach (int i in lengths)
{
//Set the value for tat cell
dataRow[lengths.IndexOf(i)] = line.Substring(tempCounter, i);
//Add the length of the current field
tempCounter += i;
}
//Add Row to DataTable
dataTable.Rows.Add(dataRow);
}
}
Is there a more efficient (time and/or memory) way of completing this task?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您正在生成该输入字符串或该长度数组吗?
如果是:
如果不是:
因为当您在同一个循环中进行提取和解析时,提取吞吐量会下降。因此,您应该将工作卸载到其他线程,也许一次使用 N 个字段,以容忍多线程同步延迟。
如果单线程提取比多线程解析太慢,那么您可以尝试向量化提取。一次启动 128 个字符采样器,检查它们是否找到前缀代码,并在它们之间进行归约以找到其中的第一个前缀(如果找到多个)。
Are you producing that input string or that length array?
If yes:
If no:
because when you do both extracting and parsing in same loop, the extracting throughput drops. So you should offload the work to other threads, maybe with N fields at once to tolerate multi-threading synchronization latency.
If extracting by single-thread is too slow compared to multi-thread parsing, then you can try to vectorize the extracting. Launch 128 char samplers at once, check if they find a prefix code and do a reduction between them to find the first prefix in them (if they find multiple).