将.xls文件转换为Python中的最新版本
我在网站上进行了一个Web_scrap,该网站通过该网站将.xls文件的下载引向下载,该文件替换了目标文件夹中的旧文件,根据Python脚本,下面:
我的脚本::
import time
from selenium import webdriver
from selenium.webdriver import Chrome
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import Select, WebDriverWait
from selenium.webdriver.support.select import Select
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
import shutil
import os
from webdriver_manager.chrome import ChromeDriverManager
import pandas as pd
import matplotlib
driver = webdriver.Chrome(ChromeDriverManager().install())
driver = webdriver.Chrome()
driver.get('http://estatisticas.cetip.com.br/astec/series_v05/paginas/lum_web_v05_series_introducao.asp?str_Modulo=Ativo&int_Idioma=1&int_Titulo=6&int_NivelBD=2/')
driver.find_element_by_xpath('//*[@id="divContainerIframeBmf"]/div/dl/dd[2]/a').click()
time.sleep(3)
driver.switch_to.frame(driver.find_element(By.XPATH, '//iframe[@name="dados_corpo"]'))
driver.switch_to.frame(driver.find_element(By.XPATH, '//frame[@name="ativo"]'))
find_dp1 = driver.find_element(By.XPATH, '//select[@name="ativo"]')
select_find_dp1 = Select(find_dp1)
select_find_dp1.select_by_visible_text("CBIO - Crédito de descarbonização")
time.sleep(3)
driver.switch_to.default_content()
driver.switch_to.frame(driver.find_element(By.ID, 'dados_corpo'))
driver.switch_to.frame(driver.find_element(By.TAG_NAME, 'frameset').find_elements(By.TAG_NAME, 'frame')[1])
time.sleep(1)
informacoes = Select(driver.find_element(By.NAME, 'selectopcoes'))
informacoes.select_by_visible_text('Estoque')
driver.switch_to.default_content()
driver.switch_to.frame(driver.find_element(By.ID, 'dados_corpo'))
driver.switch_to.frame(driver.find_element(By.TAG_NAME, 'frameset').find_elements(By.TAG_NAME, 'frame')[2])
time.sleep(1)
# Data Inicial
driver.find_element(By.NAME, 'DT_DIA_DE').send_keys('16')
driver.find_element(By.NAME, 'DT_MES_DE').send_keys('10')
driver.find_element(By.NAME, 'DT_ANO_DE').send_keys('2020')
# Data Final
driver.find_element(By.NAME, 'DT_DIA_ATE').send_keys('10')
driver.find_element(By.NAME, 'DT_MES_ATE').send_keys('02')
driver.find_element(By.NAME, 'DT_ANO_ATE').send_keys('2022')
driver.find_elements(By.CLASS_NAME, 'button')[1].click()
driver.switch_to.default_content()
driver.switch_to.frame(driver.find_element(By.TAG_NAME, 'iframe'))
time.sleep(1)
driver.find_element(By.CLASS_NAME, 'primary-text').find_element(By.TAG_NAME,'a').click()
time.sleep(4)
origem = 'C:\\Users\\prmatteo\\Downloads\\'
destino = os.path.join(origem, 'C:\\Users\\prmatteo\\xxx\\Área de Trabalho\\Arquivos Python\\renovabio2.xls')
extensao = '.xls'
for file in os.listdir(origem):
if file.endswith(extensao):
shutil.move(os.path.join(origem,file), destino)
它总是以旧的Excel格式下载.xls文件。我想 当我去替换它时,将其转换为最新的Excel版本 目的地,以免以兼容模式格式打开 个人电脑。
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
我尝试从该站点下载文件,不幸的是,它根本不会产生Excel文件。许多站点通过生成带有表和伪造
xls
扩展名的html文件来假冒Excel导出。 Excel认识到这一点,并试图将文件导入,就好像是文本或HTML一样,并向您显示警告。不幸的是,在这种情况下,文件甚至不是CSV。这是保存为文本的结果页面,包括标题。它甚至都不使用UTF8,因此非US角色会被弄脏。表数据只是一些行,带有选项卡作为定界符:
您必须解析此文本文件并自己创建一个Excel文件。您可以这样做的一种方法是使用pandas用 read_csv 跳过前9行,然后用 to_excel :
方法允许您指定不同的定分列器,跳过标题和页脚行,更改编码等。
一个可能的问题是日期格式。除非您另行指定,否则
read_csv
将导入日期为文本。您可以将其告知似乎是日期的单元,甚至尝试使用适当的参数推断日期格式。您可以通过几种方式检查加载数据。如果键入
DF
,您将看到数据框架的第一行和最后几行。您可以使用df.info()
获取列数及其类型等。I tried to download a file from that site and unfortunately for you, it doesn't produce Excel files at all. A lot of sites fake Excel exports by generating a CSV file or an HTML file with a table and a fake
xls
extension. Excel recognizes this and tries to import the file as if it was text or HTML and shows you a warning.Unfortunately, in this case the file isn't even a CSV. It's the result page saved as text, including the headers. It doesn't even use UTF8 so non-US characters get mangled. The table data is just some rows with tabs as delimiters :
You'll have to parse this text file and create an Excel file yourself. One way you could do this is to use Pandas to read the file as a CSV with read_csv skipping the first 9 rows, and then save it as Excel with to_excel:
The read_csv method allows you to specify different delimiters, skip header and footer rows, change the encoding etc.
One possible problem is the date format. Unless you specify otherwise,
read_csv
will import dates as text. You can tell it to parse cells that appear to be dates and even try to infer the date format with the proper parameters.You can inspect the loaded data in several ways. If you type
df
you'll see the first and last few rows of the DataFrame. You can usedf.info()
to get the number of columns and their types etc.如果已安装了XLRD和PANDAS,请使用PANDAS.READ_EXCEL将数据读取到数据框中。然后使用pandas.to_excel将文件输出到XLSX。
If you have xlrd and pandas installed, read the data into a DataFrame using pandas.read_excel. Then output the file to xlsx using pandas.to_excel.