向后解析 CSV 文件

发布于 2024-07-30 12:21:09 字数 324 浏览 6 评论 0原文

我有以下格式的 csv 文件：

CSV FILE
"a"             , "b"     , "c" , "d"
hello, world    , 1       , 2   , 3
1,2,3,4,5,6,7   , 2       , 456 , 87
h,1231232,3     , 3       , 45  , 44

问题是第一个字段中有逗号“，”。我无法控制文件生成，因为这是我接收文件的格式。有没有办法从行尾到开头向后读取 CSV 文件？

如果我的方向正确的话，我不介意编写一些Python脚本来做到这一点。

原文

I have csv files with the following format:

CSV FILE
"a"             , "b"     , "c" , "d"
hello, world    , 1       , 2   , 3
1,2,3,4,5,6,7   , 2       , 456 , 87
h,1231232,3     , 3       , 45  , 44

The problem is that the first field has commas "," in it. I have no control over file generation, as that's the format I receive them in. Is there a way to read a CSV file backwards, from the end of line to the beginning?

I don't mind writing a little python script to do so, if I’m guided in the right direction.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

指尖上得阳光 2024-08-06 12:21:09

rsplit 字符串方法从右侧而不是左侧开始拆分字符串，因此它可能就是您正在寻找的内容（它需要一个指定最大拆分次数的参数）：

line = "hello, world    , 1       , 2   , 3"
parts = line.rsplit(",", 3)
print parts  # prints ['hello, world    ', ' 1       ', ' 2   ', ' 3']

如果您想要从拆分列表中的每个项目的开头和结尾去除空格，那么您可以将 strip 方法与列表理解结合使用

parts = [s.strip() for s in parts]
print parts  # prints ['hello, world', '1', '2', '3']

The rsplit string method splits a string starting from the right instead of the left, and so it's probably what you're looking for (it takes an argument specifying the max number of times to split):

line = "hello, world    , 1       , 2   , 3"
parts = line.rsplit(",", 3)
print parts  # prints ['hello, world    ', ' 1       ', ' 2   ', ' 3']

If you want to strip the whitespace from the beginning and end of each item in your splitted list, then you can just use the strip method with a list comprehension

parts = [s.strip() for s in parts]
print parts  # prints ['hello, world', '1', '2', '3']

回复收藏 0 原文

想念有你 2024-08-06 12:21:09

我不完全理解为什么你想反向阅读每一行，但你可以这样做：

import csv
file = open("mycsvfile.csv")
reversedLines = [line[::-1] for line in file]
file.close()
reader = csv.reader(reversedLines)
for backwardRow in reader:
    lastField = backwardRow[0][::-1]
    secondField = backwardRow[1][::-1]

I don't fully understand why you want to read each line in reverse, but you could do this:

import csv
file = open("mycsvfile.csv")
reversedLines = [line[::-1] for line in file]
file.close()
reader = csv.reader(reversedLines)
for backwardRow in reader:
    lastField = backwardRow[0][::-1]
    secondField = backwardRow[1][::-1]

回复收藏 0 原文

烂人 2024-08-06 12:21:09

先将字符串反转，然后再处理。

tmp = tmp[::-1]

回复收藏 0 原文

埖埖迣鎅 2024-08-06 12:21:09

从您提供的示例来看，“列”看起来是固定大小的。第一个（带逗号的）是 16 个字符长，所以为什么不尝试逐行读取文件，然后每行读取前 16 个字符（作为第一列的值），然后相应地读取其余字符？获得每个值后，您可以进一步解析它（修剪空格等......）。

回复收藏 0 原文

游魂 2024-08-06 12:21:09

那不是 CSV 文件，逗号分隔就是这个意思。

您如何确定不是：

CSV FILE
"a"             , "b"     , "c" , "d"
hello           , world   , 1   , 2   , 3
1               , 2       , 3   , 4   , 5,6,7,2,456,87
h               , 1231232 , 3   , 3   , 45,44

如果文件如您所指示，那么第一组应该用引号引起来，看起来好像字段名称很奇怪，而包含逗号的字段却不是。

我不喜欢从源头修复错误，如果他们声称是这样的话，我会退回到数据生成器以提供正确的 CSV。

That's not then a CSV file, comma separated means just that.

How can you be certain that is not:

CSV FILE
"a"             , "b"     , "c" , "d"
hello           , world   , 1   , 2   , 3
1               , 2       , 3   , 4   , 5,6,7,2,456,87
h               , 1231232 , 3   , 3   , 45,44

If the file is as you indicate then the first group should be surrounded by quotes, looks as though the field names are so odd that fields containing commas are not.

I'm not a fan of fixing errors away from their source, I'd push back to the data generator to deliver proper CSV if that's what they are claiming it is.

回复收藏 0 原文

痴者 2024-08-06 12:21:09

你总是可以用正则表达式做一些事情，比如 (perl regex)

#!/usr/bin/perl

use IO::File;

if (my $file = new IO::File("test.csv"))
{
    foreach my $line (<$file>) {
    $line =~ m/^(.*),(.*?),(.*?),(.*?)$/;
    print "[$1][$2][$3][$4]\n";
    }
} else {
    print "Unable to open test.csv\n";
}

（第一个是贪婪搜索，最后 3 个不是）
编辑：发布完整代码而不仅仅是正则表达式

You could always do something with regex's, like (perl regex)

#!/usr/bin/perl

use IO::File;

if (my $file = new IO::File("test.csv"))
{
    foreach my $line (<$file>) {
    $line =~ m/^(.*),(.*?),(.*?),(.*?)$/;
    print "[$1][$2][$3][$4]\n";
    }
} else {
    print "Unable to open test.csv\n";
}

(The first is a greedy search, the last 3 are not)
Edit: posted full code instead of just the regex

回复收藏 0 原文