嵌套字典和 Openpyxl

发布于 2025-01-14 00:41:26 字数 2282 浏览 2 评论 0原文

我正在尝试从 xlsx 文件中提取数据，但遇到了一些问题。我的问题在于最后一个循环。应在字典中重复分配的值。所有类别的每个子类别的值都将被覆盖在其自身和所有其他类别上，而不是为每个类别和子类别提供相应的值。

我的直觉告诉我这与词典交互的方式有关，但我的经验不允许我自己解决这个问题。

提前致谢。

代码如下

#Excelreader.py
import openpyxl, os, pprint
os.chdir('Directory')

print('Opening workbook... ')
wb = openpyxl.load_workbook('Example.xlsx')
sheet = wb['Sheet']
all_dic ={}
part_dic = {}

print('Reading ')

#Assigning the categories to the keys of the main dictionary
for rows in range(2, sheet.max_row + 1):
    cat = sheet.cell(row = rows, column = 1).value
    all_dic[cat] = {} 

#Making a subdictionary for each subcategory
for col in range(2, sheet.max_column +1):
    subcat = sheet.cell(row = 1, column = col).value
    part_dic[subcat] = float

#Filling out the main dictionary with subdictionaries    
for key in all_dic.keys():
    all_dic[key] = part_dic

#Reading all the data from the cells and assigning the correspondent values in each subdictionary
for rows in range(2, sheet.max_row + 1):
    for col in range(2, sheet.max_column +1):
        cat = sheet.cell(row = rows, column = 1).value
        subcat = sheet.cell(row = 1, column = col).value
        amount = sheet.cell(row = rows, column = col).value
        all_dic[cat][subcat] = amount

#Writing the dictionary into a .py that will be imported in the main script
print('Writing results... ')
resultFile = open('Dump.py','w')
resultFile.write('all_data = '+ pprint.pformat(all_dic))
resultFile.close()
print('Done')

dump.py 文件如下所示

all_data = {'Cat A': {'Subcat K': 1,
          'Subcat L': 2,
          'Subcat X': 3,
          'Subcat Y': 0,
          'Subcat Z': 0},
'Cat B': {'Subcat K': 1,
          'Subcat L': 2,
          'Subcat X': 3,
          'Subcat Y': 0,
          'Subcat Z': 0},
'Cat C': {'Subcat K': 1,
          'Subcat L': 2,
          'Subcat X': 3,
          'Subcat Y': 0,
          'Subcat Z': 0},
'Cat D': {'Subcat K': 1,
          'Subcat L': 2,
          'Subcat X': 3,
          'Subcat Y': 0,
          'Subcat Z': 0},
'Cat E': {'Subcat K': 1,
          'Subcat L': 2,
          'Subcat X': 3,
          'Subcat Y': 0,
          'Subcat Z': 0}}

excel 文件如下所示以下

原文

I'm trying to extract data from a xlsx file and I've been having some issues.
My issue lies in the last loop. The values that should be assigned in the dictionary repeat. Instead of having the corresponding values for each category and subcategory, the values for each subcategory of all categories is being overwritten on themselves and all other categories.

My hunch tells me it has to do with the way dictionaries interact but my experience doesn't allow me to solve it by myself.

Thanks in advance.

The code is the following

#Excelreader.py
import openpyxl, os, pprint
os.chdir('Directory')

print('Opening workbook... ')
wb = openpyxl.load_workbook('Example.xlsx')
sheet = wb['Sheet']
all_dic ={}
part_dic = {}

print('Reading ')

#Assigning the categories to the keys of the main dictionary
for rows in range(2, sheet.max_row + 1):
    cat = sheet.cell(row = rows, column = 1).value
    all_dic[cat] = {} 

#Making a subdictionary for each subcategory
for col in range(2, sheet.max_column +1):
    subcat = sheet.cell(row = 1, column = col).value
    part_dic[subcat] = float

#Filling out the main dictionary with subdictionaries    
for key in all_dic.keys():
    all_dic[key] = part_dic

#Reading all the data from the cells and assigning the correspondent values in each subdictionary
for rows in range(2, sheet.max_row + 1):
    for col in range(2, sheet.max_column +1):
        cat = sheet.cell(row = rows, column = 1).value
        subcat = sheet.cell(row = 1, column = col).value
        amount = sheet.cell(row = rows, column = col).value
        all_dic[cat][subcat] = amount

#Writing the dictionary into a .py that will be imported in the main script
print('Writing results... ')
resultFile = open('Dump.py','w')
resultFile.write('all_data = '+ pprint.pformat(all_dic))
resultFile.close()
print('Done')

The dump.py file looks like this

all_data = {'Cat A': {'Subcat K': 1,
          'Subcat L': 2,
          'Subcat X': 3,
          'Subcat Y': 0,
          'Subcat Z': 0},
'Cat B': {'Subcat K': 1,
          'Subcat L': 2,
          'Subcat X': 3,
          'Subcat Y': 0,
          'Subcat Z': 0},
'Cat C': {'Subcat K': 1,
          'Subcat L': 2,
          'Subcat X': 3,
          'Subcat Y': 0,
          'Subcat Z': 0},
'Cat D': {'Subcat K': 1,
          'Subcat L': 2,
          'Subcat X': 3,
          'Subcat Y': 0,
          'Subcat Z': 0},
'Cat E': {'Subcat K': 1,
          'Subcat L': 2,
          'Subcat X': 3,
          'Subcat Y': 0,
          'Subcat Z': 0}}

And the excel file looks like the following

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

清风不识月 2025-01-21 00:41:26

当您执行命令 all_dic[key] = part_dic 时，您会将字典链接到同一对象，因此当您更新“Cat A”时，它还会更新 Cats B、C、D 和 E 等每只猫。然后，字典将包含所有猫的最后一行值。
使用复制来制作新对象

#Filling out the main dictionary with subdictionaries
for key in all_dic.keys():
    # all_dic[key] = part_dic
    all_dic[key] = part_dic.copy()

只是为了兴趣。您不需要所有循环来创建“all_dic”字典，只需循环一个就可以完成所有操作。如果我正确地阅读了所有内容，那么对下面的代码进行的更改将实现相同的效果；

#Excelreader.py
import openpyxl, os, pprint
os.chdir('Directory')

print('Opening workbook... ')
wb = openpyxl.load_workbook('Example.xlsx')
sheet = wb['Sheet']
all_dic = {}
part_dic = {}

print('Reading ')

#Assigning the categories to the keys of the main dictionary
for rows in range(2, sheet.max_row + 1):
    for col in range(2, sheet.max_column + 1):
        cat = sheet.cell(row = rows, column = 1).value
        subcat = sheet.cell(row = 1, column = col).value
        amount = sheet.cell(row=rows, column=col).value
        # print("Category: '" + cat + "' Subcat: '" + subcat + "' Amount: '" + str(amount) + "'")
        part_dic[subcat] = amount

    all_dic[cat] = part_dic.copy()

#Writing the dictionary into a .py that will be imported in the main script
print('Writing results... ')
resultFile = open('dump.py', 'w')
resultFile.write('all_data = '+ pprint.pformat(all_dic))
resultFile.close()
print('Done')

When you execute the command all_dic[key] = part_dic you link the dictionaries to the same object so when you are updating 'Cat A' it also updates Cats B, C, D and E and so on with each Cat. The dictionary will then contain the last row values for all Cats.
Use copy instead to make a new object

#Filling out the main dictionary with subdictionaries
for key in all_dic.keys():
    # all_dic[key] = part_dic
    all_dic[key] = part_dic.copy()

Just for interest. You don't need all the loops to create the 'all_dic' dictionary just loop one should do everything. If I'm reading everything correctly the changes to your code below will achieve the same thing;

#Excelreader.py
import openpyxl, os, pprint
os.chdir('Directory')

print('Opening workbook... ')
wb = openpyxl.load_workbook('Example.xlsx')
sheet = wb['Sheet']
all_dic = {}
part_dic = {}

print('Reading ')

#Assigning the categories to the keys of the main dictionary
for rows in range(2, sheet.max_row + 1):
    for col in range(2, sheet.max_column + 1):
        cat = sheet.cell(row = rows, column = 1).value
        subcat = sheet.cell(row = 1, column = col).value
        amount = sheet.cell(row=rows, column=col).value
        # print("Category: '" + cat + "' Subcat: '" + subcat + "' Amount: '" + str(amount) + "'")
        part_dic[subcat] = amount

    all_dic[cat] = part_dic.copy()

#Writing the dictionary into a .py that will be imported in the main script
print('Writing results... ')
resultFile = open('dump.py', 'w')
resultFile.write('all_data = '+ pprint.pformat(all_dic))
resultFile.close()
print('Done')

回复收藏 0 原文

~没有更多了~