查找文件并压缩它们(带空格)

发布于 2024-11-05 04:18:30 字数 234 浏览 5 评论 0原文

好吧,问题就这么简单。我正在编写一个简单的备份代码。除非文件中有空格,否则它工作正常。这就是我查找文件并将它们添加到 tar 存档的方式:

find . -type f | xargs tar -czvf backup.tar.gz 

问题是文件名称中包含空格,因为 tar 认为它是一个文件夹。基本上有没有一种方法可以在 find 的结果周围添加引号?或者有不同的方法来解决这个问题?

Alright, so simple problem here. I'm working on a simple back up code. It works fine except if the files have spaces in them. This is how I'm finding files and adding them to a tar archive:

find . -type f | xargs tar -czvf backup.tar.gz 

The problem is when the file has a space in the name because tar thinks that it's a folder. Basically is there a way I can add quotes around the results from find? Or a different way to fix this?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(10

你穿错了嫁妆 2024-11-12 04:18:31

为什么不呢:

tar czvf backup.tar.gz *

当然,使用 find 然后使用 xargs 是很聪明的做法,但是你这样做很困难。

更新:Porges 评论了一个查找选项,我认为这是一个比我的答案更好的答案,或者另一个:find -print0 ... | xargs -0 ....

Why not:

tar czvf backup.tar.gz *

Sure it's clever to use find and then xargs, but you're doing it the hard way.

Update: Porges has commented with a find-option that I think is a better answer than my answer, or the other one: find -print0 ... | xargs -0 ....

羁拥 2024-11-12 04:18:31

如果您有多个文件或目录,并且想要将它们压缩成独立的 *.gz 文件,您可以这样做。可选 -type f -atime

find -name "httpd-log*.txt" -type f -mtime +1 -exec tar -vzcf {}.gz {} \;

这将压缩

httpd-log01.txt
httpd-log02.txt

httpd-log01.txt.gz
httpd-log02.txt.gz

If you have multiple files or directories and you want to zip them into independent *.gz file you can do this. Optional -type f -atime

find -name "httpd-log*.txt" -type f -mtime +1 -exec tar -vzcf {}.gz {} \;

This will compress

httpd-log01.txt
httpd-log02.txt

to

httpd-log01.txt.gz
httpd-log02.txt.gz
叶落知秋 2024-11-12 04:18:31

将向 @Steve Kehlet 帖子 添加评论,但需要 50 次代表 (RIP)。

对于通过大量谷歌搜索找到这篇文章的人,我找到了一种方法,不仅可以找到给定时间范围的特定文件,而且不包含会导致焦距错误的相对路径或空格。 (非常感谢史蒂夫。)

find . -name "*.pdf" -type f -mtime 0 -printf "%f\0" | tar -czvf /dir/zip.tar.gz --null -T -
  1. . 相对目录

  2. -name "*.pdf" 查找 pdf(或任何文件类型)

  3. -type f 键入要查找的文件

  4. -mtime 0 查找过去 24 小时内创建的文件

  5. -printf "%f\0" 常规 -print0-printf "%f"代码> 对我不起作用。来自手册页:

此引用的执行方式与 GNU ls 相同。这与 -ls 和 -fls 所使用的引用机制不同。如果您能够决定 find 的输出使用什么格式,那么通常使用 '\0' 作为终止符比使用换行符更好,因为文件名可以包含空格和换行符。

  1. -czvf 创建存档,通过 gzip 过滤存档,详细列出已处理的文件,存档名称

编辑 2019-08-14:
我想补充一点,我还可以在评论中使用相同的命令,只需使用 tar 本身:

tar -czvf /archiveDir/test.tar.gz --newer-mtime=0 --ignore-failed-read *.pdf

需要 --ignore-failed-read 以防没有新的 PDF今天。

Would add a comment to @Steve Kehlet post but need 50 rep (RIP).

For anyone that has found this post through numerous googling, I found a way to not only find specific files given a time range, but also NOT include the relative paths OR whitespaces that would cause tarring errors. (THANK YOU SO MUCH STEVE.)

find . -name "*.pdf" -type f -mtime 0 -printf "%f\0" | tar -czvf /dir/zip.tar.gz --null -T -
  1. . relative directory

  2. -name "*.pdf" look for pdfs (or any file type)

  3. -type f type to look for is a file

  4. -mtime 0 look for files created in last 24 hours

  5. -printf "%f\0" Regular -print0 OR -printf "%f" did NOT work for me. From man pages:

This quoting is performed in the same way as for GNU ls. This is not the same quoting mechanism as the one used for -ls and -fls. If you are able to decide what format to use for the output of find then it is normally better to use '\0' as a terminator than to use newline, as file names can contain white space and newline characters.

  1. -czvf create archive, filter the archive through gzip , verbosely list files processed, archive name

Edit 2019-08-14:
I would like to add, that I was also able to use essentially use the same command in my comment, just using tar itself:

tar -czvf /archiveDir/test.tar.gz --newer-mtime=0 --ignore-failed-read *.pdf

Needed --ignore-failed-read in-case there were no new PDFs for today.

第几種人 2024-11-12 04:18:31

为什么不尝试一下这样的事情:tar cvf scala.tar `find src -name *.scala`

Why not give something like this a try: tar cvf scala.tar `find src -name *.scala`

流年已逝 2024-11-12 04:18:31

此处所示的另一个解决方案:

find var/log/ -iname "anaconda.*" -exec tar -cvzf file.tar.gz {} +

Another solution as seen here:

find var/log/ -iname "anaconda.*" -exec tar -cvzf file.tar.gz {} +
捶死心动 2024-11-12 04:18:31

最好的解决方案似乎是创建一个文件列表,然后归档文件,因为您可以使用其他源并对列表执行其他操作。

例如,这允许使用列表来计算正在归档的文件的大小:

#!/bin/sh

backupFileName="backup-big-$(date +"%Y%m%d-%H%M")"
backupRoot="/var/www"
backupOutPath=""

archivePath=$backupOutPath$backupFileName.tar.gz
listOfFilesPath=$backupOutPath$backupFileName.filelist

#
# Make a list of files/directories to archive
#
echo "" > $listOfFilesPath
echo "${backupRoot}/uploads" >> $listOfFilesPath
echo "${backupRoot}/extra/user/data" >> $listOfFilesPath
find "${backupRoot}/drupal_root/sites/" -name "files" -type d >> $listOfFilesPath

#
# Size calculation
#
sizeForProgress=`
cat $listOfFilesPath | while read nextFile;do
    if [ ! -z "$nextFile" ]; then
        du -sb "$nextFile"
    fi
done | awk '{size+=$1} END {print size}'
`

#
# Archive with progress
#
## simple with dump of all files currently archived
#tar -czvf $archivePath -T $listOfFilesPath
## progress bar
sizeForShow=$(($sizeForProgress/1024/1024))
echo -e "\nRunning backup [source files are $sizeForShow MiB]\n"
tar -cPp -T $listOfFilesPath | pv -s $sizeForProgress | gzip > $archivePath

The best solution seem to be to create a file list and then archive files because you can use other sources and do something else with the list.

For example this allows using the list to calculate size of the files being archived:

#!/bin/sh

backupFileName="backup-big-$(date +"%Y%m%d-%H%M")"
backupRoot="/var/www"
backupOutPath=""

archivePath=$backupOutPath$backupFileName.tar.gz
listOfFilesPath=$backupOutPath$backupFileName.filelist

#
# Make a list of files/directories to archive
#
echo "" > $listOfFilesPath
echo "${backupRoot}/uploads" >> $listOfFilesPath
echo "${backupRoot}/extra/user/data" >> $listOfFilesPath
find "${backupRoot}/drupal_root/sites/" -name "files" -type d >> $listOfFilesPath

#
# Size calculation
#
sizeForProgress=`
cat $listOfFilesPath | while read nextFile;do
    if [ ! -z "$nextFile" ]; then
        du -sb "$nextFile"
    fi
done | awk '{size+=$1} END {print size}'
`

#
# Archive with progress
#
## simple with dump of all files currently archived
#tar -czvf $archivePath -T $listOfFilesPath
## progress bar
sizeForShow=$(($sizeForProgress/1024/1024))
echo -e "\nRunning backup [source files are $sizeForShow MiB]\n"
tar -cPp -T $listOfFilesPath | pv -s $sizeForProgress | gzip > $archivePath
深海夜未眠 2024-11-12 04:18:31

对几个解决方案(以及您自己的测试)的重大警告:

当您这样做时:任何事情| xargs some

xargs 会尝试在“something”之后容纳“尽可能多的参数”,但最终可能会多次调用“something”。

所以你的尝试:找到... | xargs tar czvf 文件.tgz
可能最终会在 xargs 每次调用“tar”时覆盖“file.tgz”,并且最终只有最后一次调用! (所选择的解决方案使用 GNU -T 特殊参数来避免该问题,但并非每个人都有可用的 GNU tar)

您可以这样做:

find . -type f -print0 | xargs -0 tar -rvf backup.tar
gzip backup.tar

cygwin 上的问题证明:

$ mkdir test
$ cd test
$ seq 1 10000 | sed -e "s/^/long_filename_/" | xargs touch 
    # create the files
$ seq 1 10000 | sed -e "s/^/long_filename_/" | xargs tar czvf archive.tgz
    # will invoke tar several time as it can'f fit 10000 long filenames into 1
$ tar tzvf archive.tgz | wc -l
60
    # in my own machine, I end up with only the 60 last filenames, 
    # as the last invocation of tar by xargs overwrote the previous one(s)

# proper way to invoke tar: with -r  (which append to an existing tar file, whereas c would overwrite it)
# caveat: you can't have it compressed (you can't add to a compressed archive)
$ seq 1 10000 | sed -e "s/^/long_filename_/" | xargs tar rvf archive.tar #-r, and without z
$ gzip archive.tar
$ tar tzvf archive.tar.gz | wc -l
10000 
  # we have all our files, despite xargs making several invocations of the tar command

 

注意:xargs 的行为是众所周知的 diccifulty,并且这也是为什么,当有人想要这样做时:

find .... | xargs grep "regex"

他们必须这样写:

find ..... | xargs grep "regex" /dev/null

这样,即使 xargs 最后一次调用 grep 仅附加 1 个文件名,grep 至少会看到 2 个文件名(因为每次它都有: /dev/null,它不会找到任何东西,并且在其后附加 xargs 的文件名),因此当某些内容匹配“regex”时,将始终显示文件名”。否则,最后的结果可能会显示前面没有文件名的匹配项。

Big warning on several of the solutions (and your own test) :

When you do : anything | xargs something

xargs will try to fit "as many arguments as possible" after "something", but then you may end up with multiple invocations of "something".

So your attempt: find ... | xargs tar czvf file.tgz
may end up overwriting "file.tgz" at each invocation of "tar" by xargs, and you end up with only the last invocation! (the chosen solution uses a GNU -T special parameter to avoid the problem, but not everyone has that GNU tar available)

You could do instead:

find . -type f -print0 | xargs -0 tar -rvf backup.tar
gzip backup.tar

Proof of the problem on cygwin:

$ mkdir test
$ cd test
$ seq 1 10000 | sed -e "s/^/long_filename_/" | xargs touch 
    # create the files
$ seq 1 10000 | sed -e "s/^/long_filename_/" | xargs tar czvf archive.tgz
    # will invoke tar several time as it can'f fit 10000 long filenames into 1
$ tar tzvf archive.tgz | wc -l
60
    # in my own machine, I end up with only the 60 last filenames, 
    # as the last invocation of tar by xargs overwrote the previous one(s)

# proper way to invoke tar: with -r  (which append to an existing tar file, whereas c would overwrite it)
# caveat: you can't have it compressed (you can't add to a compressed archive)
$ seq 1 10000 | sed -e "s/^/long_filename_/" | xargs tar rvf archive.tar #-r, and without z
$ gzip archive.tar
$ tar tzvf archive.tar.gz | wc -l
10000 
  # we have all our files, despite xargs making several invocations of the tar command

 

Note: that behavior of xargs is a well know diccifulty, and it is also why, when someone wants to do :

find .... | xargs grep "regex"

they intead have to write it:

find ..... | xargs grep "regex" /dev/null

That way, even if the last invocation of grep by xargs appends only 1 filename, grep sees at least 2 filenames (as each time it has: /dev/null, where it won't find anything, and the filename(s) appended by xargs after it) and thus will always display the file names when something maches "regex". Otherwise you may end up with the last results showing matches without a filename in front.

墨小墨 2024-11-12 04:18:30

使用这个:

find . -type f -print0 | tar -czvf backup.tar.gz --null -T -

它将:

  • 处理带有空格、换行符、前导破折号和其他有趣内容的文件
  • 处理无限数量的文件
  • 不会像使用 tar -c 那样重复覆盖您的 backup.tar.gz当您有大量文件时,xargs 即可。

另请参阅:

Use this:

find . -type f -print0 | tar -czvf backup.tar.gz --null -T -

It will:

  • deal with files with spaces, newlines, leading dashes, and other funniness
  • handle an unlimited number of files
  • won't repeatedly overwrite your backup.tar.gz like using tar -c with xargs will do when you have a large number of files

Also see:

过潦 2024-11-12 04:18:30

可能还有另一种方法可以实现您想要的目标。基本上,

  1. 使用 find 命令输出您要查找的任何文件的路径。将stdout重定向到您选择的文件名。
  2. 然后使用 -T 选项进行 tar,这允许它获取文件位置列表(您刚刚使用 find 创建的位置!)

    <前><代码>查找 . -名称“*.whatever”>你的文件列表
    tar -cvf yourfile.tar -T yourListOfFiles

There could be another way to achieve what you want. Basically,

  1. Use the find command to output path to whatever files you're looking for. Redirect stdout to a filename of your choosing.
  2. Then tar with the -T option which allows it to take a list of file locations (the one you just created with find!)

    find . -name "*.whatever" > yourListOfFiles
    tar -cvf yourfile.tar -T yourListOfFiles
    
意犹 2024-11-12 04:18:30

尝试运行:

    find . -type f | xargs -d "\n" tar -czvf backup.tar.gz 

Try running:

    find . -type f | xargs -d "\n" tar -czvf backup.tar.gz 
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文