GitHub Actions - 下载的文件保存在哪里?

发布于 2025-01-10 22:36:16 字数 793 浏览 0 评论 0原文

我见过很多关于如何下载工作流程中生成的工件以在作业之间传递的问题和文档。但是,我只找到一个线程关于持久下载文件在同一工作的步骤之间,我希望有人可以帮助澄清这应该如何工作,因为该线程的答案对我来说没有意义。

我正在构建一个使用 Selenium 导航站点并手动导出数据的工作流程(遗憾的是没有 API)。在本地运行此程序时,我可以正常浏览该网站并单击下载 CSV 的按钮。然后,我可以重新导入该 CSV 以进行进一步处理(最终,它会被清理并发送到 Redshift)。但是,当我在 GitHub Actions 中运行此文件时,我不清楚该文件下载到哪里,因此无法重新导入它。我尝试过的一些操作:

  1. 在工作流程运行时回显工作目录,并设置我的 pandas.read_csv() 调用以从该目录导入文件。
  2. 下载文件,然后回显 os.listdir() 来打印工作目录的内容。当我这样做时,CSV 文件没有列出,这让我相信它没有按预期保存到工作目录中。 (这可以解释为什么#1不起作用)

FWIW,有问题的网站没有给我选择文件下载位置的选项。在本地运行时,我点击网站上的按钮,它会自动将 CSV 导出到我的下载文件夹。因此,我受制于 GitHub 决定在何处保存文件。

最后,因为我觉得有人会建议这样做——我不能选择使用 read_html() 从页面的 HTML 中抓取文件。

提前致谢!

I've seen plenty of questions and docs about how to download artifacts generated in a workflow to pass between jobs. However, I've only found one thread about persisting downloaded files between steps of the same job, and am hoping someone can help clarify how this should work, as the answer on that thread doesn't make sense to me.

I'm building a workflow that navigates a site using Selenium and exports data manually (sadly there is no API). When running this locally, I am able to navigate the site just fine and click a button that downloads a CSV. I can then re-import that CSV for further processing (ultimately, it's getting cleaned and sent to Redshift). However, when I run this in GitHub Actions, I am unclear where the file is downloaded to, and am therefore unable to re-import it. Some things I've tried:

  1. Echoing the working directory when the workflow runs, and setting up my pandas.read_csv() call to import the file from that directory.
  2. Downloading the file and then echoing os.listdir() to print the contents of the working directory. When I do this, the CSV file is not listed, which makes me believe it was not saved to the working directory as expected. (which would explain why #1 doesn't work)

FWIW, the website in question does not give me the option to choose where the file downloads. When run locally, I hit the button on the site, and it automatically exports a CSV to my Downloads folder. So I'm at the mercy of wherever GitHub decides to save the file.

Last, because I feel like someone will suggest this - it is not an option for me to use read_html() to scrape the file from the page's HTML.

Thanks in advance!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。
列表为空,暂无数据
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文