如何使用Scala从HDFS目录中删除所有文件

发布于 2025-01-20 02:47:10 字数 842 浏览 2 评论 0原文

对于我目前正在使用 Scala 和 Spark 进行的项目,我必须编写一个代码来检查我正在处理的 hdfs 目录是否为空,如果不是,我必须从该目录中删除所有文件。

在将代码部署到 Azur 之前,我使用计算机中的本地目录对其进行测试。

我首先:制定一种方法来删除该目录中的每个文件。这就是我现在所拥有的:

object DirectoryCleaner {


  val spark:SparkSession = SparkSession.builder()
    .master("local[3]")
    .appName("SparkByExamples.com")
    .getOrCreate()

  val fs = FileSystem.get(spark.sparkContext.hadoopConfiguration)
  val srcPath=new Path("C:\\Users\\myuser\\Desktop\\test_dir\\file1.csv")

  def deleFilesDir(): Unit = {
    if(fs.exists(srcPath) && fs.isFile(srcPath))
      fs.delete(srcPath, true)
  }


}

使用此代码,我可以删除单个文件(file1.csv)。我希望能够以这种方式定义我的路径 val srcPath=new Path("C:\\Users\\myuser\\Desktop\\test_dir") (不指定任何文件名),并且只需删除 test_dir 目录中的所有文件即可。我知道如何才能做到这一点吗?

谢谢你的帮助

For a project I am currently working on with Scala and Spark, I have to make a code that checks if the hdfs directory I am working on is empty, and if it is not, I have to remove every files from the directory.

Before I deploy my code into Azur, I am testing it with a local directory from my computer.

I am starting with: making a method to delete every files from this directory. This is what I have for now :

object DirectoryCleaner {


  val spark:SparkSession = SparkSession.builder()
    .master("local[3]")
    .appName("SparkByExamples.com")
    .getOrCreate()

  val fs = FileSystem.get(spark.sparkContext.hadoopConfiguration)
  val srcPath=new Path("C:\\Users\\myuser\\Desktop\\test_dir\\file1.csv")

  def deleFilesDir(): Unit = {
    if(fs.exists(srcPath) && fs.isFile(srcPath))
      fs.delete(srcPath, true)
  }


}

With this code, I am able to delete a single file (file1.csv). I would like to be able to define my path this way val srcPath=new Path("C:\\Users\\myuser\\Desktop\\test_dir") (without specifying any filename), and just delete every files from the test_dir directory. Any idea on how I could do that ?

Thank's for helping

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

九局 2025-01-27 02:47:10

使用fs.listfiles将所有文件在目录中获取,然后在删除它们时循环浏览它们。另外,将递归标志设置为false,因此您不会重复进入目录。

import org.apache.hadoop.conf.Configuration
import org.apache.hadoop.fs.{FileSystem, Path}

def deleteAllFiles(directoryPath: String, fs: FileSystem): Unit = {

  val path = new Path(directoryPath)

  // get all files in directory
  val files = fs.listFiles(path, false)

  // print and delete all files
  while (files.hasNext) {
    val file = files.next()
    fs.delete(file.getPath, false)
  }

}

// Example for local, non HDFS path 
val directoryPath = "file:///Users/m_vemuri/project"
val fs = FileSystem.get(new Configuration())
deleteAllFiles(directoryPath, fs)

Use fs.listFiles to get all the files in a directory and then loop through them while deleting them. Also, set the recursive flag to false, so you don't recurse into directories.

import org.apache.hadoop.conf.Configuration
import org.apache.hadoop.fs.{FileSystem, Path}

def deleteAllFiles(directoryPath: String, fs: FileSystem): Unit = {

  val path = new Path(directoryPath)

  // get all files in directory
  val files = fs.listFiles(path, false)

  // print and delete all files
  while (files.hasNext) {
    val file = files.next()
    fs.delete(file.getPath, false)
  }

}

// Example for local, non HDFS path 
val directoryPath = "file:///Users/m_vemuri/project"
val fs = FileSystem.get(new Configuration())
deleteAllFiles(directoryPath, fs)
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文