c# parse google trend csv没有明显的分隔符
我正在尝试从谷歌趋势解析 csv 文件,但列之间似乎没有任何分隔符?有什么办法可以让这个工作正常进行,这样我就可以在解析后将数据分成几列,或者我能做的最好的事情就是将每一行放在一列中。
我尝试过很多 csv 阅读器: http://www.codeproject.com/KB/database/CsvReader.aspx http://www.stellman-greene.com/CSVReader/
我可以尝试对每行中的数据进行子串,但这似乎是一个非常糟糕的解决方案。
来自谷歌趋势的 csv 文件示例: http ://www.google.com/trends/viz?q=stackoverflow&date=all&geo=all&graph=all_csv&sort=0&sa=N
有人有任何想法吗?
I'm trying to parse csv files from google trends, but there doesn't appear to be any delimiter between columns? Is there any way to go about getting this working so I can get data separated into columns after parsing, or is the best that I can do to just have each row in one column.
I've tried numerous csv readers:
http://www.codeproject.com/KB/database/CsvReader.aspx
http://www.stellman-greene.com/CSVReader/
I could try to substring out the data in each row, but that seems like a very poor solution.
Example csv file from google trends:
http://www.google.com/trends/viz?q=stackoverflow&date=all&geo=all&graph=all_csv&sort=0&sa=N
Anyone got any ideas?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
在我看来,这些列是用制表符(U+0009)分隔的,不是吗?只要做
It seems to me the columns are delimited with tabs (U+0009), aren’t they? Just do
在我看来,它是用 UTF-16 编码的,带有制表符 (U+0009) 分隔符。
Looks to me like it's encoded in UTF-16 with a delimiter of tab (U+0009).
有 2 个可能的问题导致这些库无法很好地解析它:
前 4 行可能
“欺骗”那些解析器相信
只有 2 列
这并不是真正的 CSV (逗号分隔值) 文件,使用制表符代替逗号
编写自己的解析器非常简单明了对于这种特殊情况(值中没有转义制表符):
打开文件
跳过前 5 行
对于您阅读的每一行,将其按
\t
分割并获取列值There are 2 possible issues why it does not get parsed well by those libraries:
The first 4 lines could possibly
"trick" those parsers into believing
there are only 2 columns
This is not really a CSV (Comma-Separated Values) file, tabs are used instead of commas
It's easy and straightforward to write your own parser for this particular case (there are no escaped tabs in values):
Open the file
Skip the first 5 lines
For each line you read, split it by
\t
and get column values