pyspark脚本中的HDFS命令

发布于 2025-01-27 13:33:01 字数 565 浏览 3 评论 0原文

我正在编写一个简单的Pyspark脚本，将HDFS文件和文件夹从一个位置复制到另一个位置。但是我找不到一种使用Pyspark复制文件夹和文件的方法

可用，

hadoop = sc._jvm.org.apache.hadoop
Path = hadoop.fs.Path
FileSystem = hadoop.fs.FileSystem
conf = hadoop.conf.Configuration()
fs = FileSystem.get(conf)
source = hadoop.fs.Path('/user/xxx/data')
destination = hadoop.fs.Path('/user/xxx/data1')

if (fs.exists(Path('/user/xxx/data'))):
    for f in fs.listStatus(Path('/user/xxx/data')):
        print('File path', str(f.getPath()))
        **** how to use copy command here ?

我已经浏览了许多文档和答案在线

原文

I am writing a simple pyspark script to copy hdfs files and folders from one location to another. I have gone through many docs and answers available online but i could not find a way to copy folders and files using pyspark or to execute hdfs commands using pyspark(particularly copy folders and files)

Below is my code

hadoop = sc._jvm.org.apache.hadoop
Path = hadoop.fs.Path
FileSystem = hadoop.fs.FileSystem
conf = hadoop.conf.Configuration()
fs = FileSystem.get(conf)
source = hadoop.fs.Path('/user/xxx/data')
destination = hadoop.fs.Path('/user/xxx/data1')

if (fs.exists(Path('/user/xxx/data'))):
    for f in fs.listStatus(Path('/user/xxx/data')):
        print('File path', str(f.getPath()))
        **** how to use copy command here ?

Thanks in advance

分享到QQ

分享到微博