如何使用Scala从HDFS目录中删除所有文件
对于我目前正在使用 Scala 和 Spark 进行的项目,我必须编写一个代码来检查我正在处理的 hdfs 目录是否为空,如果不是,我必须从该目录中删除所有文件。
在将代码部署到 Azur 之前,我使用计算机中的本地目录对其进行测试。
我首先:制定一种方法来删除该目录中的每个文件。这就是我现在所拥有的:
object DirectoryCleaner {
val spark:SparkSession = SparkSession.builder()
.master("local[3]")
.appName("SparkByExamples.com")
.getOrCreate()
val fs = FileSystem.get(spark.sparkContext.hadoopConfiguration)
val srcPath=new Path("C:\\Users\\myuser\\Desktop\\test_dir\\file1.csv")
def deleFilesDir(): Unit = {
if(fs.exists(srcPath) && fs.isFile(srcPath))
fs.delete(srcPath, true)
}
}
使用此代码,我可以删除单个文件(file1.csv
)。我希望能够以这种方式定义我的路径 val srcPath=new Path("C:\\Users\\myuser\\Desktop\\test_dir")
(不指定任何文件名),并且只需删除 test_dir
目录中的所有文件即可。我知道如何才能做到这一点吗?
谢谢你的帮助
For a project I am currently working on with Scala and Spark, I have to make a code that checks if the hdfs directory I am working on is empty, and if it is not, I have to remove every files from the directory.
Before I deploy my code into Azur, I am testing it with a local directory from my computer.
I am starting with: making a method to delete every files from this directory. This is what I have for now :
object DirectoryCleaner {
val spark:SparkSession = SparkSession.builder()
.master("local[3]")
.appName("SparkByExamples.com")
.getOrCreate()
val fs = FileSystem.get(spark.sparkContext.hadoopConfiguration)
val srcPath=new Path("C:\\Users\\myuser\\Desktop\\test_dir\\file1.csv")
def deleFilesDir(): Unit = {
if(fs.exists(srcPath) && fs.isFile(srcPath))
fs.delete(srcPath, true)
}
}
With this code, I am able to delete a single file (file1.csv
). I would like to be able to define my path this way val srcPath=new Path("C:\\Users\\myuser\\Desktop\\test_dir")
(without specifying any filename), and just delete every files from the test_dir
directory. Any idea on how I could do that ?
Thank's for helping
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
使用
fs.listfiles
将所有文件在目录中获取,然后在删除它们时循环浏览它们。另外,将递归
标志设置为false
,因此您不会重复进入目录。Use
fs.listFiles
to get all the files in a directory and then loop through them while deleting them. Also, set therecursive
flag tofalse
, so you don't recurse into directories.