向后解析 CSV 文件

发布于 2024-07-30 12:21:09 字数 324 浏览 4 评论 0原文

我有以下格式的 csv 文件:

CSV FILE
"a"             , "b"     , "c" , "d"
hello, world    , 1       , 2   , 3
1,2,3,4,5,6,7   , 2       , 456 , 87
h,1231232,3     , 3       , 45  , 44

问题是第一个字段中有逗号“,”。 我无法控制文件生成,因为这是我接收文件的格式。有没有办法从行尾到开头向后读取 CSV 文件?

如果我的方向正确的话,我不介意编写一些Python脚本来做到这一点。

I have csv files with the following format:

CSV FILE
"a"             , "b"     , "c" , "d"
hello, world    , 1       , 2   , 3
1,2,3,4,5,6,7   , 2       , 456 , 87
h,1231232,3     , 3       , 45  , 44

The problem is that the first field has commas "," in it. I have no control over file generation, as that's the format I receive them in. Is there a way to read a CSV file backwards, from the end of line to the beginning?

I don't mind writing a little python script to do so, if I’m guided in the right direction.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(8

指尖上得阳光 2024-08-06 12:21:09

rsplit 字符串方法从右侧而不是左侧开始拆分字符串,因此它可能就是您正在寻找的内容(它需要一个指定最大拆分次数的参数):

line = "hello, world    , 1       , 2   , 3"
parts = line.rsplit(",", 3)
print parts  # prints ['hello, world    ', ' 1       ', ' 2   ', ' 3']

如果您想要从拆分列表中的每个项目的开头和结尾去除空格,那么您可以将 strip 方法与列表理解结合使用

parts = [s.strip() for s in parts]
print parts  # prints ['hello, world', '1', '2', '3']

The rsplit string method splits a string starting from the right instead of the left, and so it's probably what you're looking for (it takes an argument specifying the max number of times to split):

line = "hello, world    , 1       , 2   , 3"
parts = line.rsplit(",", 3)
print parts  # prints ['hello, world    ', ' 1       ', ' 2   ', ' 3']

If you want to strip the whitespace from the beginning and end of each item in your splitted list, then you can just use the strip method with a list comprehension

parts = [s.strip() for s in parts]
print parts  # prints ['hello, world', '1', '2', '3']
想念有你 2024-08-06 12:21:09

我不完全理解为什么你想反向阅读每一行,但你可以这样做:

import csv
file = open("mycsvfile.csv")
reversedLines = [line[::-1] for line in file]
file.close()
reader = csv.reader(reversedLines)
for backwardRow in reader:
    lastField = backwardRow[0][::-1]
    secondField = backwardRow[1][::-1]

I don't fully understand why you want to read each line in reverse, but you could do this:

import csv
file = open("mycsvfile.csv")
reversedLines = [line[::-1] for line in file]
file.close()
reader = csv.reader(reversedLines)
for backwardRow in reader:
    lastField = backwardRow[0][::-1]
    secondField = backwardRow[1][::-1]
烂人 2024-08-06 12:21:09

先将字符串反转,然后再处理。

tmp = tmp[::-1]

Reverse the string first and then process it.

tmp = tmp[::-1]

埖埖迣鎅 2024-08-06 12:21:09

从您提供的示例来看,“列”看起来是固定大小的。 第一个(带逗号的)是 16 个字符长,所以为什么不尝试逐行读取文件,然后每行读取前 16 个字符(作为第一列的值),然后相应地读取其余字符? 获得每个值后,您可以进一步解析它(修剪空格等......)。

From the sample You have provided, it looks like "columns" are fixed size. First (the one with commas) is 16 characters long, so why don't You try reading the file line by line, then for each line reading the first 16 characters (as a value of first column), and the rest accordingly? After You have each value, You can go and parse it further (trim whitespaces, and so on...).

游魂 2024-08-06 12:21:09

那不是 CSV 文件,逗号分隔就是这个意思。

您如何确定不是:

CSV FILE
"a"             , "b"     , "c" , "d"
hello           , world   , 1   , 2   , 3
1               , 2       , 3   , 4   , 5,6,7,2,456,87
h               , 1231232 , 3   , 3   , 45,44

如果文件如您所指示,那么第一组应该用引号引起来,看起来好像字段名称很奇怪,而包含逗号的字段却不是。

我不喜欢从源头修复错误,如果他们声称是这样的话,我会退回到数据生成器以提供正确的 CSV。

That's not then a CSV file, comma separated means just that.

How can you be certain that is not:

CSV FILE
"a"             , "b"     , "c" , "d"
hello           , world   , 1   , 2   , 3
1               , 2       , 3   , 4   , 5,6,7,2,456,87
h               , 1231232 , 3   , 3   , 45,44

If the file is as you indicate then the first group should be surrounded by quotes, looks as though the field names are so odd that fields containing commas are not.

I'm not a fan of fixing errors away from their source, I'd push back to the data generator to deliver proper CSV if that's what they are claiming it is.

痴者 2024-08-06 12:21:09

你总是可以用正则表达式做一些事情,比如 (perl regex)

#!/usr/bin/perl

use IO::File;

if (my $file = new IO::File("test.csv"))
{
    foreach my $line (<$file>) {
    $line =~ m/^(.*),(.*?),(.*?),(.*?)$/;
    print "[$1][$2][$3][$4]\n";
    }
} else {
    print "Unable to open test.csv\n";
}

(第一个是贪婪搜索,最后 3 个不是)
编辑:发布完整代码而不仅仅是正则表达式

You could always do something with regex's, like (perl regex)

#!/usr/bin/perl

use IO::File;

if (my $file = new IO::File("test.csv"))
{
    foreach my $line (<$file>) {
    $line =~ m/^(.*),(.*?),(.*?),(.*?)$/;
    print "[$1][$2][$3][$4]\n";
    }
} else {
    print "Unable to open test.csv\n";
}

(The first is a greedy search, the last 3 are not)
Edit: posted full code instead of just the regex

冷︶言冷语的世界 2024-08-06 12:21:09

如果您始终期望相同的列数,并且只有第一列可以包含逗号,则只需读取任何内容并在开头连接多余的列即可。

问题在于接口不明确,您可以尝试规避此问题,但更好的解决方案是尝试修复接口(这通常比创建多个补丁更困难......)。

If you always expect the same number of columns, and only the first column can contain commas, just read anything and concatenate excess columns at the beginning.

The problem is that the interface is ambiguous, and you can try to circumvent this, but the better solution is to try to get the interface fixed (which is often harder than creating several patches...).

浸婚纱 2024-08-06 12:21:09

我同意啤酒先生的观点。 这是一个格式错误的 csv 文件。 您最好的选择是找到其他分隔符或停止超载逗号或引用/转义非字段分隔逗号

I agree with mr beer. That is a badly formed csv file. Your best bet is to find other delimiters or stop overloading the commas or quote/escape the non field separating commas

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文