Bash：符号链接跟随

发布于 2024-10-31 21:14:06 字数 296 浏览 8 评论 0原文

我有一个文件树，其中包含我生成的一堆数据。我决定在数据生成的几个阶段，我想在所使用的程序中尝试一些不同的配置选项。

我的解决方案是复制数据树，并符号链接到所有原始数据（乘以我正在运行的新测试的数量）。然后我会让程序根据需要破坏符号链接。结果将是指向原始树的符号链接，以获取不受我的新配置影响的数据，以及任何新配置的真实数据。

问题是我使用的大多数程序上的 -clobber 选项都遵循符号链接，因此它实际上破坏了我的原始数据。有什么我可以尝试的（也许像 bash 环境设置之类的东西？）可能会使所有这些程序破坏实际的符号链接，而不是它指向的数据？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

风吹短裙飘 2024-11-07 21:14:06

这可能是不可能的 - 该选项可能通过简单地以“截断”模式打开文件来工作 - 要替换符号链接，它实际上必须事先使用单独的函数删除文件[符号链接]。您可以尝试在原始数据文件上设置权限（chmod -w）以不允许写入，但这可能会阻止其工作

回复收藏 0 原文

百变从容 2024-11-07 21:14:06

程序在写入之前是否检查新输出是否与旧输出相同？如果不是，那么符号链接似乎不会提供任何优势，因为数据存储在运行分析过程中始终是唯一的。

您能够更改程序写入输出的方式吗？以下序列将避免遵循符号链接：

则将新输出写入 out.tmp
如果旧输出 out 和 out.tmp 相同，，删除 out.tmp
否则，将 out.tmp 移至 out 之上

数据有多大？它是否足够大以至于值得努力优化存储需求？您始终可以保存完整的输出，并在事后运行一些分析（在最简单的情况下，diff）以查看数据是否相同。

如果有很多 GB 的数据，您可能需要查看一个可以为您优化重复数据的文件系统（该功能称为“重复数据删除”）。或者，您可以使用 LVM 的快照支持，它允许廉价的写时复制文件系统的快照。

作为一个穷人的去重文件系统，你可以在 bash 中做这样的事情：

for file in $output_files; do
    md5=`md5sum $file | awk '{print $1}'`
    if [ ! -f "db/$md5" ]; then
        mv $file db/$md5
    fi
    ln -sf db/$md5 $file
done

Do the programs check whether the new output is the same as the old before writing? If not, then it doesn't seem like the symlinks will offer any advantage, since the datastore will always uniquified in the course of running the analysis.

Are you able to change the way that the programs write their output? The following sequence will avoid following the symlink:

write the new output to out.tmp
if the old output out and out.tmp are identical, delete out.tmp
otherwise, move out.tmp on top of out

How big is the data? Is it big enough that it's worth the effort to optimize the storage requirement? You can always save the full output, and run some analysis (in the simplest case, diff) after the fact to see if the data is the same.

If it is many GBs of data, you may want to look in to a filesystem that will optimize the duplicate data for you (the feature is known as "de-duplication"). Or, you can use LVM's snapshot support, which allows cheap copy-on-write snapshots of a filesystem.

As a poor-man's de-duplicating file system, you can do something like this in bash:

for file in $output_files; do
    md5=`md5sum $file | awk '{print $1}'`
    if [ ! -f "db/$md5" ]; then
        mv $file db/$md5
    fi
    ln -sf db/$md5 $file
done

回复收藏 0 原文

~没有更多了~