如何循环到字典中的字典以指定方式组织CSV中的数据

发布于 2025-01-17 13:32:04 字数 1535 浏览 1 评论 0原文

我制作了一个脚本:

- 从CSV文件中获取数据 - 在数据文件的第一列中以相同的值来删除数据 -Istert在不同模板文本文件中的指定线中排序数据 - 将文件放入第一列中与数据文件中有不同值不同的副本中,下面的图显示了它的工作方式:

“如何程序的作品“

但是我还需要做两件事。在如上所述的单独文件中,有一些相同的值从数据文件的第二列中进行了相同的值,则该文件应从第三列插入值,而不是从第二列中重复相同的值。在下图上,我显示了它的外观:

“在此处输入映像说明”

我还需要添加某个地方的sproded value通过“ _”从数据文件中的第一列。

有数据文件:

111_0,3005,QWE
111_0,3006,SDE
111_0,3006,LFR
111_1,3005,QWE
111_1,5345,JTR
112_0,3103,JPP
112_0,3343,PDK
113_0,2137,TRE
113_0,2137,OMG

还有我制作的代码:

import shutil
 
with open("data.csv") as f:
    contents = f.read()
    contents = contents.splitlines()
 
values_per_baseline = dict()
 
for line in contents:
    key = line.split(',')[0]
    values = line.split(',')[1:]
    if key not in values_per_baseline:
        values_per_baseline[key] = []
    values_per_baseline[key].append(values)
 
for file in values_per_baseline.keys():
    x = 3
    shutil.copyfile("of.txt", (f"of_%s.txt" % file))
    filename = f"of_%s.txt" % file
    for values in values_per_baseline[file]:
        with open(filename, "r") as f:
            contents = f.readlines()
            contents.insert(x, '      o = ' + values[0] + '\n          ' + 'a = ' + values[1] +'\n')
        with open(filename, "w") as f:
            contents = "".join(contents)
            f.write(contents)
            f.close()

我一直在尝试制作列表字典词典,但我无法以正确的方式实现它来使它起作用。

I made a script that:

-takes data from CSV file -sort it by same values in first column of data file
-instert sorted data in specifield line in different template text file
-save the file in as many copies as there are different values in first column from data file This picture below show how it works:

how program works

But there are two more things I need to do. When in separate files as showed above, there are some of the same values from second column of the data file, then this file should insert value from third column instead of repeating the same value from second column. On the picture below I showed how it should look like:

enter image description here

What I also need is to add somewhere separeted value of first column from data file by "_".

There is datafile:

111_0,3005,QWE
111_0,3006,SDE
111_0,3006,LFR
111_1,3005,QWE
111_1,5345,JTR
112_0,3103,JPP
112_0,3343,PDK
113_0,2137,TRE
113_0,2137,OMG

and there is code i made:

import shutil
 
with open("data.csv") as f:
    contents = f.read()
    contents = contents.splitlines()
 
values_per_baseline = dict()
 
for line in contents:
    key = line.split(',')[0]
    values = line.split(',')[1:]
    if key not in values_per_baseline:
        values_per_baseline[key] = []
    values_per_baseline[key].append(values)
 
for file in values_per_baseline.keys():
    x = 3
    shutil.copyfile("of.txt", (f"of_%s.txt" % file))
    filename = f"of_%s.txt" % file
    for values in values_per_baseline[file]:
        with open(filename, "r") as f:
            contents = f.readlines()
            contents.insert(x, '      o = ' + values[0] + '\n          ' + 'a = ' + values[1] +'\n')
        with open(filename, "w") as f:
            contents = "".join(contents)
            f.write(contents)
            f.close()

I have been trying to make something like a dictionary of dictionaries of lists but I can't implement it in correct way to make it works.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

-柠檬树下少年和吉他 2025-01-24 13:32:04

当我运行您的代码时,我会收到此错误:

    contents.insert(x, '      o = ' + values[0] + '\n          ' + 'a = ' + values[3] +'\n')
IndexError: list index out of range

让我们考虑一下此错误来自哪里。它是列表上的indexError。此行上使用的唯一列表是values,因此这似乎是一个开始寻找的好地方。

要调试,您可以考虑在吐出错误的行之前添加类似的内容:

            print(values)
            print(values[0])
            print(values[3])

这给出了

['3005', 'QWE']
3005
Traceback (most recent call last):
  File "qqq.py", line 25, in <module>
    print(values[3])
IndexError: list index out of range

value [3]的问题,这是有道理的,因为len(values)= = 2,因此索引需要为01。如果我们将值[3]更改为值[1],那么我认为您得到了想要的东西。例如:

$ cat of_111_0.txt
line
line
line
      o = 3006
          a = LFR
      o = 3006
          a = SDE
      o = 3005
          a = QWE
line
line
line
line
line

要了解问题的下一步,我建议您将第一个循环更改为:

for line in contents:
    key = line.split(',')[0]
    values = line.split(',')[1:]
    if key not in values_per_baseline:
        values_per_baseline[key] = {}
    if values[0] not in values_per_baseline[key]:
        values_per_baseline[key][values[0]] = values[1]
    else:
        values_per_baseline[key][values[0]] += '<COMMA>' + values[1]

这使您的字典为:

{'111_0': {'3005': 'QWE', 
           '3006': 'SDE<COMMA>LFR'}, 
 '111_1': {'3005': 'QWE', 
           '5345': 'JTR'}, 
 '112_0': {'3103': 'JPP', 
           '3343': 'PDK'}, 
 '113_0': {'2137': 'TRE<COMMA>OMG'}}

,当写入文件时,您需要将循环更改为:

        for key in values_per_baseline[file]:
            contents.insert(x, f'{6*sp}o = {key}\n{10*sp}a = {values_per_baseline[file][key]}\n')

然后 像:

line
line
line
      o = 3006
          a = SDE<COMMA>LFR
      o = 3005
          a = QWE
line
line
line
line
line

您可以做的其他事情

现在,您可以做几件事来简化代码,同时保持其可读性。*

  • 在第10和11行,无需使用line.split两次。只需添加一个具有类似split_line = line.split(',')的行,然后具有key = split_line [0]valuts = splity_line [1 :]。 (您可以消除value一起,然后参考split_line [0]splite_line [1]这将使您的代码
  • 第17行上
  • 在 读取。您首先使用(f“ of_%s.txt”%文件),然后在下一行中的文件中定义它。具有shutil.copyfile(“ of.txt”,fileName)。 =“ nofollow noreferrer”> f-strings 您可以编写filename = f“ of _ {file} .txt”
  • 在第23行中 。 命令fo f-string(如果您发现它更可读)。 *sp} a = {values [1]} \ n')
  • 在您的中,在values_per_baseline.keys() loop中的值中,您正在打开和关闭文件。比您需要的。您可以重新订购操作:
    with open(filename, "r") as f:
        contents = f.readlines()
        for values in values_per_baseline[file]:
            contents.insert(x, '      o = ' + values[0] + '\n          ' + 'a = ' + values[1] +'\n')
    with open(filename, "w") as f:
        contents = "".join(contents)
        f.write(contents)
        f.close()

*对于这样的简短脚本,我认为确保可读性比确保其效率更重要,因为您希望能够在3周或3年内回来了解你做了什么。因此,我还建议您评论您的所作所为。

When I run your code, I get this error:

    contents.insert(x, '      o = ' + values[0] + '\n          ' + 'a = ' + values[3] +'\n')
IndexError: list index out of range

Let's think where this error is coming from. It is an IndexError on a list. The only list used on this line is values so that seems like a good place to start looking.

To debug, you can consider adding something like this before the line that is spitting the error:

            print(values)
            print(values[0])
            print(values[3])

which gives

['3005', 'QWE']
3005
Traceback (most recent call last):
  File "qqq.py", line 25, in <module>
    print(values[3])
IndexError: list index out of range

So the problem is with values[3], which makes sense since len(values)==2 and so the indices need to be 0 and 1. If we change values[3] to values[1] then I think you get what you want. e.g.:

$ cat of_111_0.txt
line
line
line
      o = 3006
          a = LFR
      o = 3006
          a = SDE
      o = 3005
          a = QWE
line
line
line
line
line

To get to the next step in your problem, I would suggest you change your first loop to:

for line in contents:
    key = line.split(',')[0]
    values = line.split(',')[1:]
    if key not in values_per_baseline:
        values_per_baseline[key] = {}
    if values[0] not in values_per_baseline[key]:
        values_per_baseline[key][values[0]] = values[1]
    else:
        values_per_baseline[key][values[0]] += '<COMMA>' + values[1]

That gives your dictionary to be:

{'111_0': {'3005': 'QWE', 
           '3006': 'SDE<COMMA>LFR'}, 
 '111_1': {'3005': 'QWE', 
           '5345': 'JTR'}, 
 '112_0': {'3103': 'JPP', 
           '3343': 'PDK'}, 
 '113_0': {'2137': 'TRE<COMMA>OMG'}}

Then when writing to the file, you would need to change your loop to:

        for key in values_per_baseline[file]:
            contents.insert(x, f'{6*sp}o = {key}\n{10*sp}a = {values_per_baseline[file][key]}\n')

And your file now looks like:

line
line
line
      o = 3006
          a = SDE<COMMA>LFR
      o = 3005
          a = QWE
line
line
line
line
line

Other things you could do

Now, there are a couple of things you can do to streamline your code while keeping it readable.*

  • On lines 10 and 11, there is no need to use line.split twice. Just add a line that has something like split_line = line.split(',') and then have key = split_line[0] and values = split_line[1:]. (You could do away with key and values all together and just reference split_line[0] and split_line[1] but that would make your code less readable.
  • On line 17, you are defining x in every loop. Just take it out of the loop.
  • On lines 12 and 13, you are first using (f"of_%s.txt" % file) and then defining it in a file on the next line. Suggest you define filename first and then just have shutil.copyfile("of.txt", filename). Also, you are using f-strings incorrectly. You could just write filename = f"of_{file}.txt".
  • On line 23, you could change your insert command to an f-string (if you find it more readable). For example: contents.insert(x, f'{6*sp}o = {values[0]}\n{10*sp}a = {values[1]}\n')
  • At the end, in your for values in values_per_baseline.keys() loop, you are opening and closing files way more than you need to. You can reorder your operations:
    with open(filename, "r") as f:
        contents = f.readlines()
        for values in values_per_baseline[file]:
            contents.insert(x, '      o = ' + values[0] + '\n          ' + 'a = ' + values[1] +'\n')
    with open(filename, "w") as f:
        contents = "".join(contents)
        f.write(contents)
        f.close()

*For a short script like this, I would argue that making sure it is readable is more important than making sure it is efficient, since you will want to be able to come back in 3 weeks or 3 years and understand what you did. For that reason, I would also recommend you comment what you did.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文