ValueError:文件不是可识别的 Excel 文件,如何读取 .xls filke?

发布于 2025-01-16 04:26:33 字数 1272 浏览 0 评论 0原文

我正在尝试读取一个 excel 文件,它是一个 url。链接和代码如下:


excel = 'https://www.marketnews.usda.gov/mnp/fv-report?&commAbr=AVOC&step3date=true&locAbr=HX&repType=termPriceDaily&refine=false&Run=Run&type=termPrice&repTypeChanger=termPriceDaily&environment=&_environment=1&locAbrPass=CHICAGO%7C%7CHX&locChoose=commodity&commodityClass=allcommodity&locAbrlength=1&organic=&repDate=01%2F01%2F2022&endDate=03%2F17%2F2022&format=excel&rebuild=false'

data = pd.read_excel(excel, engine='openpyxl')


我尝试使用 openpyxl ,但出现以下错误:

File is not a zip file

我什至尝试使用 pd.read_csv 但数据采用 html 格式,不易阅读:


df = pd.read_csv('https://www.marketnews.usda.gov/mnp/fv-report?&commAbr=AVOC&step3date=true&locAbr=HX&repType=termPriceDaily&refine=false&Run=Run&type=termPrice&repTypeChanger=termPriceDaily&environment=&_environment=1&locAbrPass=CHICAGO%7C%7CHX&locChoose=commodity&commodityClass=allcommodity&locAbrlength=1&organic=&repDate=01%2F01%2F2022&endDate=03%2F17%2F2022&format=excel&rebuild=false',
                 sep='</tr><tr>'
           )

请帮忙!

I am trying to read an excel file which is a url. The link and the code is below:


excel = 'https://www.marketnews.usda.gov/mnp/fv-report?&commAbr=AVOC&step3date=true&locAbr=HX&repType=termPriceDaily&refine=false&Run=Run&type=termPrice&repTypeChanger=termPriceDaily&environment=&_environment=1&locAbrPass=CHICAGO%7C%7CHX&locChoose=commodity&commodityClass=allcommodity&locAbrlength=1&organic=&repDate=01%2F01%2F2022&endDate=03%2F17%2F2022&format=excel&rebuild=false'

data = pd.read_excel(excel, engine='openpyxl')


I tried using openpyxl and i get the following error:

File is not a zip file

I even tried using pd.read_csv but the data is coming in html format which isn't easily readable:


df = pd.read_csv('https://www.marketnews.usda.gov/mnp/fv-report?&commAbr=AVOC&step3date=true&locAbr=HX&repType=termPriceDaily&refine=false&Run=Run&type=termPrice&repTypeChanger=termPriceDaily&environment=&_environment=1&locAbrPass=CHICAGO%7C%7CHX&locChoose=commodity&commodityClass=allcommodity&locAbrlength=1&organic=&repDate=01%2F01%2F2022&endDate=03%2F17%2F2022&format=excel&rebuild=false',
                 sep='</tr><tr>'
           )

Please help!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

手心的海 2025-01-23 04:26:33

您将无法使用 pandas 下载/读取 Excel 文件,因为 url 链接不是直接的 Excel 文件。

使用以下代码代替 pd.read_excel

import pandas as pd
df = pd.read_html(excel)[0]
print(df)

You won't be able to download/read the excel file using pandas as the url link is not of direct excel file.

Instead of pd.read_excel use below code:

import pandas as pd
df = pd.read_html(excel)[0]
print(df)
半枫 2025-01-23 04:26:33

该 url 返回 html 页面而不是 Excel 文件。

使用:

df = pd.read_html(excel)[0]
print(df)

# Output
     Commodity Name City Name  ... Price Comment                                           Comments
0          AVOCADOS   CHICAGO  ...           NaN  Wide range in price, quality, condition and ap...
1          AVOCADOS   CHICAGO  ...           NaN  Wide range in price, quality, condition and ap...
2          AVOCADOS   CHICAGO  ...           NaN  Wide range in price, quality, condition and ap...
3          AVOCADOS   CHICAGO  ...           NaN  Wide range in price, quality, condition and ap...
4          AVOCADOS   CHICAGO  ...           NaN  Wide range in price, quality, condition and ap...
...             ...       ...  ...           ...                                                ...
1368       AVOCADOS   CHICAGO  ...           NaN  Wide range in price, quality, condition and ap...
1369       AVOCADOS   CHICAGO  ...           NaN  Wide range in price, quality, condition and ap...
1370       AVOCADOS   CHICAGO  ...           NaN  Wide range in price, quality, condition and ap...
1371       AVOCADOS   CHICAGO  ...           NaN  Wide range in price, quality, condition and ap...
1372       AVOCADOS   CHICAGO  ...           NaN  Wide range in price, quality, condition and ap...

[1373 rows x 29 columns]

The url returns an html page not an excel file.

Use:

df = pd.read_html(excel)[0]
print(df)

# Output
     Commodity Name City Name  ... Price Comment                                           Comments
0          AVOCADOS   CHICAGO  ...           NaN  Wide range in price, quality, condition and ap...
1          AVOCADOS   CHICAGO  ...           NaN  Wide range in price, quality, condition and ap...
2          AVOCADOS   CHICAGO  ...           NaN  Wide range in price, quality, condition and ap...
3          AVOCADOS   CHICAGO  ...           NaN  Wide range in price, quality, condition and ap...
4          AVOCADOS   CHICAGO  ...           NaN  Wide range in price, quality, condition and ap...
...             ...       ...  ...           ...                                                ...
1368       AVOCADOS   CHICAGO  ...           NaN  Wide range in price, quality, condition and ap...
1369       AVOCADOS   CHICAGO  ...           NaN  Wide range in price, quality, condition and ap...
1370       AVOCADOS   CHICAGO  ...           NaN  Wide range in price, quality, condition and ap...
1371       AVOCADOS   CHICAGO  ...           NaN  Wide range in price, quality, condition and ap...
1372       AVOCADOS   CHICAGO  ...           NaN  Wide range in price, quality, condition and ap...

[1373 rows x 29 columns]
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文