如何在Pyspark中使用当前日期和时间保存文件名？

发布于 2025-02-11 08:05:04 字数 1182 浏览 1 评论 0原文

我在Pyspark中有一个数据框架，并希望将文件保存为CSV，其中当前时间戳为文件名。我在Azure Synapse笔记本中执行此操作，并希望每天运行笔记本。

将数据框存储为“ df”

我使用以下代码

from datetime import datetime
date = datetime.now().strftime("%Y_%m_%d-%I:%M:%S_%p")

df.coalesce(1).write.option("mode","append").option("header","true").option("sep",",").csv("abfss://[email protected]/{date}.csv")

，将文件保存为{date} .csv我将CSV文件保存在数据湖中，并且将其保存为“ {date} .csv”作为文件夹，内部我可以保存。请参阅CSV文件。

内部文件夹

必需的输出：

我需要文件名为“ 29-06-2022 15:30:30:25 pm.csv”不创建新文件夹。我每天都在运行笔记本电脑，因此每天，文件将处于当前日期格式。

谁能建议，上述代码中有什么问题？

请注意，我只需要在Pyspark而不是在Python中执行此操作。

原文

I have a data frame in PySpark and would like to save the file as a CSV with the current timestamp as a file name. I am executing this in Azure Synapse Notebook and would like to run the notebook every day.

I stored my data frame as "df"

Using the below code, saving file as {date}.csv

from datetime import datetime
date = datetime.now().strftime("%Y_%m_%d-%I:%M:%S_%p")

df.coalesce(1).write.option("mode","append").option("header","true").option("sep",",").csv("abfss://[email protected]/{date}.csv")

I am saving the CSV file in the data lake and it saving as "{date}.csv" as a folder and inside I can see the CSV file.

Inside folder

Required Output:

I need the file name to be "29-06-2022 15:30:25 PM.csv" without creating a new folder. I am running the notebook every day so each day, the file will be in the current date format.

Can anyone advise, what is the issue in the above code?

Note that I need to execute this only in PySpark, not in Python.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

清音悠歌 2025-02-18 08:05:04

您可以做正确的事情，只需在字符串之前添加f。然后，它将接受date变量：

.csv(f"abfss://[email protected]/{date}.csv")

You do everything right, just add an f before the string. Then it will accept the date variable:

.csv(f"abfss://[email protected]/{date}.csv")

回复收藏 0 原文

亚希 2025-02-18 08:05:04

您还可以使用+将文件路径字符串值与变量分开，如

from datetime import datetime
date = datetime.now().strftime("%Y_%m_%d-%I:%M:%S_%p")

df.to_csv("abfss://<container_name>@<storage_accountname>.dfs.core.windows.net/from/file1_"+date+".csv", sep=',', encoding='utf-8', index=False)

output：

You can also use + to separate file path string values with variables as below

from datetime import datetime
date = datetime.now().strftime("%Y_%m_%d-%I:%M:%S_%p")

df.to_csv("abfss://<container_name>@<storage_accountname>.dfs.core.windows.net/from/file1_"+date+".csv", sep=',', encoding='utf-8', index=False)