LXML模块如何访问Internet?

发布于 2025-02-12 18:03:25 字数 816 浏览 1 评论 0原文

有人知道LXML模块如何进入互联网?

我将数据从网站提取到“熊猫”。

我了解Pandas通过LXML模块进入URL。

有人知道LXML是如何上网的吗?通过“请求”?找不到代码

我的代码 -

import time
import datetime
import pandas as pd


tickers = ['TSLA', 'TWTR', 'MSFT', 'GOOG', 'AAPL']
interval = '1d'
period1 = int(time.mktime(datetime.datetime(2022, 1, 1).timetuple()))
period2 = int(time.mktime(datetime.datetime(2022, 6, 30).timetuple()))


xlwriter = pd.ExcelWriter('historical prices.xlsx', engine='openpyxl')

for ticker in tickers:
    query_string = f'https://query1.finance.yahoo.com/v7/finance/download/{ticker}?period1={period1}&period2={period2}&interval={interval}&events=history&includeAdjustedClose=true'
    df = pd.read_csv(query_string)
    df.to_excel(xlwriter, sheet_name=ticker, index=False)

xlwriter.save()

someone knows how does lxml module goes to the internet?

im pulling data from website to excel via 'pandas'.

i understand that pandas get to the url by lxml module.

someone knows how lxml get to the internet? by 'requests'? cant find the code

my code -

import time
import datetime
import pandas as pd


tickers = ['TSLA', 'TWTR', 'MSFT', 'GOOG', 'AAPL']
interval = '1d'
period1 = int(time.mktime(datetime.datetime(2022, 1, 1).timetuple()))
period2 = int(time.mktime(datetime.datetime(2022, 6, 30).timetuple()))


xlwriter = pd.ExcelWriter('historical prices.xlsx', engine='openpyxl')

for ticker in tickers:
    query_string = f'https://query1.finance.yahoo.com/v7/finance/download/{ticker}?period1={period1}&period2={period2}&interval={interval}&events=history&includeAdjustedClose=true'
    df = pd.read_csv(query_string)
    df.to_excel(xlwriter, sheet_name=ticker, index=False)

xlwriter.save()

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

多像笑话 2025-02-19 18:03:25

它不使用LXML。如果是CSV/XML/HTML/JSON等,这取决于您要阅读的内容。 LXML用于解析,而不是用于检索文件。

当您使用URL时,它将使用函数_Get_filePath_or_buffer您可以找到在这里

在这里链接

您会在Souce代码中找到

        import urllib.request

        # assuming storage_options is to be interpreted as headers
        req_info = urllib.request.Request(filepath_or_buffer, headers=storage_options)
        with urlopen(req_info) as req:
            content_encoding = req.headers.get("Content-Encoding", None)
            if content_encoding == "gzip":
                # Override compression based on Content-Encoding header
                compression = {"method": "gzip"}
            reader = BytesIO(req.read())

It's not using lxml. It depend what you are trying to read if it's a csv/xml/html/json and so on. Lxml it's using for parsing and not for retrieving the file.

When you are using an url it will use the function _get_filepath_or_buffer that you can find here

Here the the link.

You will find in the souce code

        import urllib.request

        # assuming storage_options is to be interpreted as headers
        req_info = urllib.request.Request(filepath_or_buffer, headers=storage_options)
        with urlopen(req_info) as req:
            content_encoding = req.headers.get("Content-Encoding", None)
            if content_encoding == "gzip":
                # Override compression based on Content-Encoding header
                compression = {"method": "gzip"}
            reader = BytesIO(req.read())

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文