高级 .Csv 解析 - 调查答案文件?
好的,首先我想指出,我知道使用逗号或制表符空格等解析 .csv 文件。但是我仍然遇到问题,并且有点卡住了。
我想做的是构建一个读取 .csv 调查答案文件的应用程序(最好是所有扩展类型,但让我们从一个开始)。这些调查答案文件是由其他网站预先生成的。 (即用户从调查网站下载他们的调查答案,然后使用我的应用程序)。该应用程序的目的是对数据进行统计分析。
所以我遇到的问题是弄清楚如何读入并将问题与答案与不相关的文本分开。我需要一种可重用的方法来对具有不同问题类型的多个答案文件等执行此操作。
我知道执行此操作的一种更简单的方法是让用户使用我的应用程序创建调查,然后对其进行分析,这样我就可以控制格式,但是目前这不是一个选择。
注意:我计划将所有变量读入系统,然后允许用户从列表中选择变量并对它们执行分析算法。
我再次知道他们是高级 csv 读者,我只是在寻找如何解决我的问题的想法。
Ok firstly I'd just like to point out that I'm aware of parsing .csv files using commas or tab spaces etc. I still have a problem however and I'm a little stuck.
What I'm trying to do is build an application that reads in a .csv survey answer file (preferably all extension types but lets start with one). These survey answer files are pre generated by other websites. (i.e the user downloads their survey answers from a survey site and then uses my application). The purpose of the application is performing statistical analysis on the data.
So the problem I'm having is figuring out how to read in and separate questions- from answers- from irrelevant text. I need a reusable way of doing this for multiple answer files with different question types etc.
I know an easier method of doing this would be to have the user create a survey with my application and then analyze it, so I can control the formatting but at the moment this is not an option.
NOTE: I plan on reading all the variables in to the system, and then allow the user to select variables from a list and execute analysis algorithms on them.
Again I know their are advanced csv readers out there I'm just looking for ideas on how to go about my problem.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
使用
Microsoft.VisualBasic.FileIO.TextFieldParser
它专门用于解析.csv 文件。它也处理字段中的逗号。
use
Microsoft.VisualBasic.FileIO.TextFieldParser
it is specifically designed to parse .csv files. it handles commas in fields too.
为了解析 CSV,您可以使用我在解决方案中描述的正则表达式 帖子。这将逐行评估。
For parsing CSV, you could use a regular expression I describe in my solution to this post. This would be evaluated line-by-line.
文件的第一行(CSV(分隔符是逗号)或 TSV(分隔符是制表符))是否包含“列”名称?
所有行是否具有相同数量的值(如有必要,由连续分隔符指定缺失值或空值)?
如果两个问题的答案都是肯定的,一种选择是使用 ADO 和 JET 4.0 驱动程序来读取每个文件作为关系数据源。
有大量示例可以演示该技术。从此处开始。
Does the first row of your file (CSV (delimiter is comma) or TSV (delimiter is tab)) hold the 'column' names?
Do all rows have the same number of values (if necessary, with missing or null values being designated by consecutive delimiters)?
If the answers to both questions are in the affirmative, one option is to use ADO with the JET 4.0 driver to read each file as a relational data source.
There are plenty of samples that demonstrate the technique. Start here.