如何在Pyspark中使用当前日期和时间保存文件名?
我在Pyspark中有一个数据框架,并希望将文件保存为CSV,其中当前时间戳为文件名。我在Azure Synapse笔记本中执行此操作,并希望每天运行笔记本。
将数据框存储为“ df”
我使用以下代码
from datetime import datetime
date = datetime.now().strftime("%Y_%m_%d-%I:%M:%S_%p")
df.coalesce(1).write.option("mode","append").option("header","true").option("sep",",").csv("abfss://[email protected]/{date}.csv")
,将文件保存为{date} .csv我将CSV文件保存在数据湖中,并且将其保存为“ {date} .csv”作为文件夹,内部我可以保存。请参阅CSV文件。
内部文件夹
必需的输出:
我需要文件名为“ 29-06-2022 15:30:30:25 pm.csv”不创建新文件夹。我每天都在运行笔记本电脑,因此每天,文件将处于当前日期格式。
谁能建议,上述代码中有什么问题?
请注意,我只需要在Pyspark而不是在Python中执行此操作。
I have a data frame in PySpark and would like to save the file as a CSV with the current timestamp as a file name. I am executing this in Azure Synapse Notebook and would like to run the notebook every day.
I stored my data frame as "df"
Using the below code, saving file as {date}.csv
from datetime import datetime
date = datetime.now().strftime("%Y_%m_%d-%I:%M:%S_%p")
df.coalesce(1).write.option("mode","append").option("header","true").option("sep",",").csv("abfss://[email protected]/{date}.csv")
I am saving the CSV file in the data lake and it saving as "{date}.csv" as a folder and inside I can see the CSV file.
Inside folder
Required Output:
I need the file name to be "29-06-2022 15:30:25 PM.csv" without creating a new folder. I am running the notebook every day so each day, the file will be in the current date format.
Can anyone advise, what is the issue in the above code?
Note that I need to execute this only in PySpark, not in Python.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
您可以做正确的事情,只需在字符串之前添加
f
。然后,它将接受date
变量:You do everything right, just add an
f
before the string. Then it will accept thedate
variable:您还可以使用
+
将文件路径字符串值与变量分开,如output:
You can also use
+
to separate file path string values with variables as belowOutput: