Bash:符号链接跟随

发布于 2024-10-31 21:14:06 字数 296 浏览 8 评论 0原文

我有一个文件树,其中包含我生成的一堆数据。我决定在数据生成的几个阶段,我想在所使用的程序中尝试一些不同的配置选项。

我的解决方案是复制数据树,并符号链接到所有原始数据(乘以我正在运行的新测试的数量)。然后我会让程序根据需要破坏符号链接。结果将是指向原始树的符号链接,以获取不受我的新配置影响的数据,以及任何新配置的真实数据。

问题是我使用的大多数程序上的 -clobber 选项都遵循符号链接,因此它实际上破坏了我的原始数据。有什么我可以尝试的(也许像 bash 环境设置之类的东西?)可能会使所有这些程序破坏实际的符号链接,而不是它指向的数据?

I've got a file tree containing a bunch of data I've generated. I've decided that at several stages of the data generation, I'd like to try some different configuration options in the programs that get used.

My solution was duplicating the data tree, and symlinking to all of the original data (multiplied by the number of new tests that I'm running). Then I'd let the programs clobber away the symlinks as needed. The result would be symlinks to the original tree for data that didn't get affected by my new configurations, and real data for anything new.

The problem is that the -clobber option on most of the programs I use follow symlinks, so it in fact clobbered over my original data. Is there anything I could try (maybe something like bash environment settings?) that might make all of these programs clobber the actual symlink, rather than the data it points at?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

风吹短裙飘 2024-11-07 21:14:06

这可能是不可能的 - 该选项可能通过简单地以“截断”模式打开文件来工作 - 要替换​​符号链接,它实际上必须事先使用单独的函数删除文件[符号链接]。您可以尝试在原始数据文件上设置权限(chmod -w)以不允许写入,但这可能会阻止其工作

This is probably not possible - the option likely works by simply opening the file in 'truncate' mode - to replace a symlink it would actually have to delete the file [the symlink] beforehand with a separate function. You could try setting the permissions (chmod -w) on the original data file to not allow writing, but that might simply stop it from working

百变从容 2024-11-07 21:14:06

程序在写入之前是否检查新输出是否与旧输出相同?如果不是,那么符号链接似乎不会提供任何优势,因为数据存储在运行分析过程中始终是唯一的。

您能够更改程序写入输出的方式吗?以下序列将避免遵循符号链接:

  1. 则将新输出写入 out.tmp
  2. 如果旧输出 outout.tmp 相同, ,删除 out.tmp
  3. 否则,将 out.tmp 移至 out 之上

数据有多大?它是否足够大以至于值得努力优化存储需求?您始终可以保存完整的输出,并在事后运行一些分析(在最简单的情况下,diff)以查看数据是否相同。

如果有很多 GB 的数据,您可能需要查看一个可以为您优化重复数据的文件系统(该功能称为“重复数据删除”)。或者,您可以使用 LVM 的快照支持,它允许廉价的写时复制文件系统的快照。

作为一个穷人的去重文件系统,你可以在 bash 中做这样的事情:

for file in $output_files; do
    md5=`md5sum $file | awk '{print $1}'`
    if [ ! -f "db/$md5" ]; then
        mv $file db/$md5
    fi
    ln -sf db/$md5 $file
done

Do the programs check whether the new output is the same as the old before writing? If not, then it doesn't seem like the symlinks will offer any advantage, since the datastore will always uniquified in the course of running the analysis.

Are you able to change the way that the programs write their output? The following sequence will avoid following the symlink:

  1. write the new output to out.tmp
  2. if the old output out and out.tmp are identical, delete out.tmp
  3. otherwise, move out.tmp on top of out

How big is the data? Is it big enough that it's worth the effort to optimize the storage requirement? You can always save the full output, and run some analysis (in the simplest case, diff) after the fact to see if the data is the same.

If it is many GBs of data, you may want to look in to a filesystem that will optimize the duplicate data for you (the feature is known as "de-duplication"). Or, you can use LVM's snapshot support, which allows cheap copy-on-write snapshots of a filesystem.

As a poor-man's de-duplicating file system, you can do something like this in bash:

for file in $output_files; do
    md5=`md5sum $file | awk '{print $1}'`
    if [ ! -f "db/$md5" ]; then
        mv $file db/$md5
    fi
    ln -sf db/$md5 $file
done
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文