ValueError:长度不匹配 - 尝试读取PANDAS中有多个纸的多个XLSX文件时?
我正在尝试阅读多个XLSX
每个有51张纸的文件,我想在一个数据范围内与Pandas一起阅读,重新格式化和串联。但是,我能够正确读取一个xlsx
,使用51张纸,但是当我使用相同的逻辑来读取多个xlsx
每个具有51张纸的文件时,我会遇到以下错误:
---------------------------------------------------------------------- --------------------------------- valueerror Trackback(最近的电话 最后)输入[31],in 12印刷(名称) 13印刷(Len(名称)) ---> 14 src_df =(pd.concat({k:df.Assign(state = df.columns [0])。set_axis(['county'','eartientation'eartientation','state'], axis = 1)对于k,df in pd.read_excel(xls,sheet_name =) 名称).items()})。rename_axis(['st',none],, 轴= 0).Reset_index(level = 0)) 15 dfs.append(src_df)
在[31]中输入(.0) 12印刷(名称) 13印刷(Len(名称)) ---> 14 src_df =(pd.concat({k:df.Assign(state = df.columns [0])。set_axis(['county'','eartientation'eartientation','state'], axis = 1)对于k,df in pd.read_excel(xls,sheet_name =) 名称).items()})。rename_axis(['st',none],, 轴= 0).Reset_index(level = 0)) 15 dfs.append(src_df)..
valueerror:长度不匹配:预期轴有4个元素,新值 有3个元素
我当前的尝试:
这是示例数据对于我的尝试:
VenueMap_Counties_04-02-15.xlsx
VenueMap_Counties_06-13-18.xlsx
VenueMap_Counties_11-04-19.xlsx
这是我尝试阅读带有多张纸的XLSX
文件中的尝试:
files = glob.glob('https://github.com/adamFlyn/test_rl/*.xlsx')
files_xlsx = [f for f in files if f[-4:] == 'xlsx']
if len(files_xlsx)>0:
print("more than one xlsx files found")
# read multiple xlsx files into one
dfs = pd.DataFrame()
for f in files_xlsx:
xls = pd.ExcelFile(f)
sheetnames = xls.sheet_names[1:]
print(sheetnames)
print(len(sheetnames))
src_df = (pd.concat({k: df.assign(State=df.columns[0]).set_axis(['County','Orientation','State'], axis=1) for k, df in pd.read_excel(xls, sheet_name = sheetnames).items()}).rename_axis(['ST', None], axis=0).reset_index(level=0))
dfs.append(src_df)
有人有任何想法如何修复此值吗?有什么想法吗?
所需的输出:
在我所需的输出中,我想要index(['st','county','entientation','state'],dtype ='object')
如最终数据帧中的列。这是预期的输出数据框架的前5行:
desired_outputDf.head().to_dict()= {'ST': {0: 'AK', 1: 'AK', 2: 'AK', 3: 'AK', 4: 'AK'},
'County': {0: 'ALEUTIANS EAST',
1: 'ALEUTIANS WEST',
2: 'ANCHORAGE',
3: 'BETHEL',
4: 'BRISTOL BAY'},
'Orientaion': {0: 'Plaintiff ',
1: 'Plaintiff ',
2: 'Neutral',
3: 'Plaintiff ',
4: 'Plaintiff '},
'State': {0: 'Alaska', 1: 'Alaska', 2: 'Alaska', 3: 'Alaska', 4: 'Alaska'}}
谁能指出我如何解决此问题?有什么想法吗?谢谢
I am trying to read multiple xlsx
files where each has 51 sheets and I want to read, reformat and concatenate them in one dataframe with pandas. However, I am able to read one xlsx
with 51 sheets correctly but when I use same logic for reading multiple xlsx
files where each has 51 sheets, I am getting following error:
--------------------------------------------------------------------------- ValueError Traceback (most recent call
last) Input In [31], in
12 print(names)
13 print(len(names))
---> 14 src_df = (pd.concat({k: df.assign(State=df.columns[0]).set_axis(['County','Orientation','State'],
axis=1) for k, df in pd.read_excel(xls, sheet_name =
names).items()}).rename_axis(['ST', None],
axis=0).reset_index(level=0))
15 dfs.append(src_df)Input In [31], in (.0)
12 print(names)
13 print(len(names))
---> 14 src_df = (pd.concat({k: df.assign(State=df.columns[0]).set_axis(['County','Orientation','State'],
axis=1) for k, df in pd.read_excel(xls, sheet_name =
names).items()}).rename_axis(['ST', None],
axis=0).reset_index(level=0))
15 dfs.append(src_df) .. ..ValueError: Length mismatch: Expected axis has 4 elements, new values
have 3 elements
my current attempt:
here are the example data that I used for my attempt:
VenueMap_Counties_04-02-15.xlsx
VenueMap_Counties_06-13-18.xlsx
VenueMap_Counties_11-04-19.xlsx
and here is my attempt to read those xlsx
files with multiple sheets into one dataframe:
files = glob.glob('https://github.com/adamFlyn/test_rl/*.xlsx')
files_xlsx = [f for f in files if f[-4:] == 'xlsx']
if len(files_xlsx)>0:
print("more than one xlsx files found")
# read multiple xlsx files into one
dfs = pd.DataFrame()
for f in files_xlsx:
xls = pd.ExcelFile(f)
sheetnames = xls.sheet_names[1:]
print(sheetnames)
print(len(sheetnames))
src_df = (pd.concat({k: df.assign(State=df.columns[0]).set_axis(['County','Orientation','State'], axis=1) for k, df in pd.read_excel(xls, sheet_name = sheetnames).items()}).rename_axis(['ST', None], axis=0).reset_index(level=0))
dfs.append(src_df)
Does anyone have any ideas how to fix this valueError? Any thoughts?
desired output:
in my desired output, I want Index(['ST', 'County', 'Orientation', 'State'], dtype='object')
as of columns in final dataframe. this is top 5 rows of desired output dataframe:
desired_outputDf.head().to_dict()= {'ST': {0: 'AK', 1: 'AK', 2: 'AK', 3: 'AK', 4: 'AK'},
'County': {0: 'ALEUTIANS EAST',
1: 'ALEUTIANS WEST',
2: 'ANCHORAGE',
3: 'BETHEL',
4: 'BRISTOL BAY'},
'Orientaion': {0: 'Plaintiff ',
1: 'Plaintiff ',
2: 'Neutral',
3: 'Plaintiff ',
4: 'Plaintiff '},
'State': {0: 'Alaska', 1: 'Alaska', 2: 'Alaska', 3: 'Alaska', 4: 'Alaska'}}
Can anyone point me out how to fix this issue? any thoughts? Thanks
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您的三个XLSX文件中的两个在NC表的C列中具有流浪的背景字符。解决这些问题,应该可以工作。
Update :
这是应该执行您想要的代码:
输出:
Two of your three xlsx files have a stray backtick character in column C of the NC sheet. Fix these and it should work.
UPDATE:
Here is code that should do what you want:
Output: