使用逗号分隔列表的最简单方法
我即将构建一个解决方案,让我每晚都会收到逗号分隔的列表。这是一个大约有 14000 行的列表,我需要浏览该列表并选择列表中的一些值。 我收到的文档由每个“案例”的大约 50 个分号分隔值组成。文档的结构如何:
"";"2010-10-17";"";"";"";Period-Last24h";"问题是客户找不到......"; 等等,还有 43 个分号语句。每个“案例”都以值“Total 515”结束;
我需要做的就是遍历所有这些“案例”并撤回“案例”中的一些值。 “案例”总是按照相同的顺序构建,我知道我需要撤回的始终是第 3、15 和 45 个分号值。
我怎样才能以最简单的方式做到这一点?
I'm about to build a solution to where I receive a comma separated list every night. It's a list with around 14000 rows, and I need to go through the list and select some of the values in the list.
The document I receive is built up with around 50 semicolon separated values for every "case". How the document is structured:
"";"2010-10-17";"";"";"";Period-Last24h";"Problem is that the customer cant find....";
and so on, with 43 more semicolon statements. And every "case" ends with the value "Total 515";
What I need to do is go through all these "cases" and withdraw some of the values in the "cases". The "cases" is always built up in the same order and I know that it's always the 3, 15 and 45'th semicolon value that I need to withdraw.
How can I do this in the easiest way?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
我认为你应该把这个问题分解成更小的问题。以下是我要采取的步骤:
不要担心“最简单”的方法。您需要一种有效的方式。无论你做什么,先让某些东西发挥作用,然后再考虑优化它,使其变得最简单、最快、最小等。
I think you should decompose this problem into smaller problems. Here are the steps I'd take:
Don't worry about the "easiest" way. You need one way that works. Whatever you do, get something working and worry about optimizing it to make it easiest, fastest, smallest, etc. later on.
假设“行”是行并且您逐行读取,您的主要工具应该是 string.Split:
请注意,这是一种简单的方法,如果任何列的内容可以包含 ';',则该方法将会失败。
Assuming the "rows" are lines and that you read line by line, your main tool should be string.Split:
Note that this is a simple approach that will fail if the content of any column can contain ';'
您可以使用
String.Split
两次。第一次使用“总计515”;作为使用此重载的分割字符串。这将为您提供一系列案例。
第二次使用“;”作为分割字符,在每种情况下使用此重载。这将为您提供每种情况的数据数组。由于数据一致,您可以提取该数组的第 3rd、第 15 和 45th 元素。
You could use
String.Split
twice.The first time using "Total 515"; as the split string using this overload. This will give you an array of cases.
The second time using ";" as the split character using this overload on each of the cases. This will give you a data array for each case. As the data is consistent you can extract the 3rd, 15th and 45th elements of this array.
我会搜索现有的 csv 库。转义规则可能不容易映射到正则表达式。
如果我自己编写一个库,我首先将每一行解析为一个列表/字符串数组。然后在第二步(可能在 csv 库本身之外)将字符串列表转换为强类型对象。
I'd search for an existing csv library. The escaping rules are probably not that easily mapped to regex.
If writing a library myself I'd first parse each line into a list/an array of strings. And then in a second step(probably outside of the csv library itself) convert the stringlist to a strongly typed object.
一种简单但缓慢的方法是从输入中读取单个字符(例如,
StringReader
类)。编写一个读取引用的ReadItem
方法,继续读取直到下一个引用,然后查找下一个字符。如果是换行的分号,则表示已经读取了一项。如果是另一个引号,请在正在读取的项目中添加单引号。否则,抛出异常。然后使用此方法将输入数据拆分为一系列项目,每行存储在例如字符串[行中的项目数]中,行存储在列表中。 。然后,您可以使用此类读取另一个类中的 CSV 数据,该类将读取的数据解码为您可以从中获取数据的对象。A simple but slow approach would be reading single characters from the input (
StringReader
class, for example). Write aReadItem
method that reads a quote, continues to read until the next quote, and then looks for the next character. If it is a newline of semicolon, one item has been read. If it is another quote, add a single quote to the item being read. Otherwise, throw an exception. Then use this method to split up the input data into a series of items, each line stored e.g. in astring[number of items in a row]
, lines stored in aList<>
. Then you can use this class to read the CSV data inside another class that decodes the data read into objects that you can get your data out of.