ValueError：长度不匹配 - 尝试读取PANDAS中有多个纸的多个XLSX文件时？

发布于 2025-02-01 13:01:22 字数 2609 浏览 2 评论 0原文

我正在尝试阅读多个XLSX每个有51张纸的文件，我想在一个数据范围内与Pandas一起阅读，重新格式化和串联。但是，我能够正确读取一个xlsx，使用51张纸，但是当我使用相同的逻辑来读取多个xlsx每个具有51张纸的文件时，我会遇到以下错误：

---------------------------------------------------------------------- --------------------------------- valueerror Trackback（最近的电话最后）输入[31]，in 12印刷（名称） 13印刷（Len（名称）） ---＆gt; 14 src_df =（pd.concat（{k：df.Assign（state = df.columns [0]）。set_axis（['county''，'eartientation'eartientation'，'state']， axis = 1）对于k，df in pd.read_excel（xls，sheet_name =）名称）.items（）}）。rename_axis（['st'，none]，，轴= 0）.Reset_index（level = 0）） 15 dfs.append（src_df）
在[31]中输入（.0） 12印刷（名称） 13印刷（Len（名称）） ---＆gt; 14 src_df =（pd.concat（{k：df.Assign（state = df.columns [0]）。set_axis（['county''，'eartientation'eartientation'，'state']， axis = 1）对于k，df in pd.read_excel（xls，sheet_name =）名称）.items（）}）。rename_axis（['st'，none]，，轴= 0）.Reset_index（level = 0）） 15 dfs.append（src_df）..
valueerror：长度不匹配：预期轴有4个元素，新值有3个元素

我当前的尝试：

这是示例数据对于我的尝试：

VenueMap_Counties_04-02-15.xlsx
VenueMap_Counties_06-13-18.xlsx
VenueMap_Counties_11-04-19.xlsx

这是我尝试阅读带有多张纸的XLSX文件中的尝试：

files = glob.glob('https://github.com/adamFlyn/test_rl/*.xlsx')
files_xlsx = [f for f in files if f[-4:] == 'xlsx']
if len(files_xlsx)>0:
    print("more than one xlsx files found")

# read multiple xlsx files into one
dfs = pd.DataFrame()
for f in files_xlsx:
    xls = pd.ExcelFile(f)
    sheetnames = xls.sheet_names[1:]
    print(sheetnames)
    print(len(sheetnames))
    src_df = (pd.concat({k: df.assign(State=df.columns[0]).set_axis(['County','Orientation','State'], axis=1) for k, df in pd.read_excel(xls, sheet_name = sheetnames).items()}).rename_axis(['ST', None], axis=0).reset_index(level=0))
    dfs.append(src_df)

有人有任何想法如何修复此值吗？有什么想法吗？

所需的输出：

在我所需的输出中，我想要index（['st'，'county'，'entientation'，'state']，dtype ='object'）如最终数据帧中的列。这是预期的输出数据框架的前5行：

desired_outputDf.head().to_dict()= {'ST': {0: 'AK', 1: 'AK', 2: 'AK', 3: 'AK', 4: 'AK'},
 'County': {0: 'ALEUTIANS EAST',
  1: 'ALEUTIANS WEST',
  2: 'ANCHORAGE',
  3: 'BETHEL',
  4: 'BRISTOL BAY'},
 'Orientaion': {0: 'Plaintiff ',
  1: 'Plaintiff ',
  2: 'Neutral',
  3: 'Plaintiff ',
  4: 'Plaintiff '},
 'State': {0: 'Alaska', 1: 'Alaska', 2: 'Alaska', 3: 'Alaska', 4: 'Alaska'}}

谁能指出我如何解决此问题？有什么想法吗？谢谢

原文

I am trying to read multiple xlsx files where each has 51 sheets and I want to read, reformat and concatenate them in one dataframe with pandas. However, I am able to read one xlsx with 51 sheets correctly but when I use same logic for reading multiple xlsx files where each has 51 sheets, I am getting following error:

--------------------------------------------------------------------------- ValueError Traceback (most recent call
last) Input In [31], in
12 print(names)
13 print(len(names))
---> 14 src_df = (pd.concat({k: df.assign(State=df.columns[0]).set_axis(['County','Orientation','State'],
axis=1) for k, df in pd.read_excel(xls, sheet_name =
names).items()}).rename_axis(['ST', None],
axis=0).reset_index(level=0))
15 dfs.append(src_df)
Input In [31], in (.0)
12 print(names)
13 print(len(names))
---> 14 src_df = (pd.concat({k: df.assign(State=df.columns[0]).set_axis(['County','Orientation','State'],
axis=1) for k, df in pd.read_excel(xls, sheet_name =
names).items()}).rename_axis(['ST', None],
axis=0).reset_index(level=0))
15 dfs.append(src_df) .. ..
ValueError: Length mismatch: Expected axis has 4 elements, new values
have 3 elements

my current attempt:

here are the example data that I used for my attempt:

VenueMap_Counties_04-02-15.xlsx
VenueMap_Counties_06-13-18.xlsx
VenueMap_Counties_11-04-19.xlsx

and here is my attempt to read those xlsx files with multiple sheets into one dataframe:

files = glob.glob('https://github.com/adamFlyn/test_rl/*.xlsx')
files_xlsx = [f for f in files if f[-4:] == 'xlsx']
if len(files_xlsx)>0:
    print("more than one xlsx files found")

# read multiple xlsx files into one
dfs = pd.DataFrame()
for f in files_xlsx:
    xls = pd.ExcelFile(f)
    sheetnames = xls.sheet_names[1:]
    print(sheetnames)
    print(len(sheetnames))
    src_df = (pd.concat({k: df.assign(State=df.columns[0]).set_axis(['County','Orientation','State'], axis=1) for k, df in pd.read_excel(xls, sheet_name = sheetnames).items()}).rename_axis(['ST', None], axis=0).reset_index(level=0))
    dfs.append(src_df)

Does anyone have any ideas how to fix this valueError? Any thoughts?

desired output:

in my desired output, I want Index(['ST', 'County', 'Orientation', 'State'], dtype='object') as of columns in final dataframe. this is top 5 rows of desired output dataframe:

desired_outputDf.head().to_dict()= {'ST': {0: 'AK', 1: 'AK', 2: 'AK', 3: 'AK', 4: 'AK'},
 'County': {0: 'ALEUTIANS EAST',
  1: 'ALEUTIANS WEST',
  2: 'ANCHORAGE',
  3: 'BETHEL',
  4: 'BRISTOL BAY'},
 'Orientaion': {0: 'Plaintiff ',
  1: 'Plaintiff ',
  2: 'Neutral',
  3: 'Plaintiff ',
  4: 'Plaintiff '},
 'State': {0: 'Alaska', 1: 'Alaska', 2: 'Alaska', 3: 'Alaska', 4: 'Alaska'}}

Can anyone point me out how to fix this issue? any thoughts? Thanks

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

初吻给了烟 2025-02-08 13:01:22

您的三个XLSX文件中的两个在NC表的C列中具有流浪的背景字符。解决这些问题，应该可以工作。

Update ：

这是应该执行您想要的代码：

files_xlsx='''
./VenueMap_Counties_04-02-15.xlsx
./VenueMap_Counties_06-13-18.xlsx
./VenueMap_Counties_11-04-19.xlsx
'''.split('\n')[1:-1]
import pandas as pd
dfs = pd.concat([
    df.rename(columns={df.columns[0]:'County'}).assign(
    State=df.columns[0], ST=ST)[['ST', 'County', 'Orientation', 'State']] for f in files_xlsx for ST, df in pd.read_excel(f, sheet_name=None).items() if ST != 'Master'])
print(f'shape: {dfs.shape}')
print(dfs.head())

输出：

shape: (9351, 4)
   ST          County Orientation   State
0  AK  Aleutians East  Plaintiff   Alaska
1  AK  Aleutians West  Plaintiff   Alaska
2  AK       Anchorage     Neutral  Alaska
3  AK          Bethel  Plaintiff   Alaska
4  AK     Bristol Bay  Plaintiff   Alaska

Two of your three xlsx files have a stray backtick character in column C of the NC sheet. Fix these and it should work.

UPDATE:

Here is code that should do what you want:

files_xlsx='''
./VenueMap_Counties_04-02-15.xlsx
./VenueMap_Counties_06-13-18.xlsx
./VenueMap_Counties_11-04-19.xlsx
'''.split('\n')[1:-1]
import pandas as pd
dfs = pd.concat([
    df.rename(columns={df.columns[0]:'County'}).assign(
    State=df.columns[0], ST=ST)[['ST', 'County', 'Orientation', 'State']] for f in files_xlsx for ST, df in pd.read_excel(f, sheet_name=None).items() if ST != 'Master'])
print(f'shape: {dfs.shape}')
print(dfs.head())

Output:

shape: (9351, 4)
   ST          County Orientation   State
0  AK  Aleutians East  Plaintiff   Alaska
1  AK  Aleutians West  Plaintiff   Alaska
2  AK       Anchorage     Neutral  Alaska
3  AK          Bethel  Plaintiff   Alaska
4  AK     Bristol Bay  Plaintiff   Alaska

回复收藏 0 原文

~没有更多了~