Bash:符号链接跟随
我有一个文件树,其中包含我生成的一堆数据。我决定在数据生成的几个阶段,我想在所使用的程序中尝试一些不同的配置选项。
我的解决方案是复制数据树,并符号链接到所有原始数据(乘以我正在运行的新测试的数量)。然后我会让程序根据需要破坏符号链接。结果将是指向原始树的符号链接,以获取不受我的新配置影响的数据,以及任何新配置的真实数据。
问题是我使用的大多数程序上的 -clobber 选项都遵循符号链接,因此它实际上破坏了我的原始数据。有什么我可以尝试的(也许像 bash 环境设置之类的东西?)可能会使所有这些程序破坏实际的符号链接,而不是它指向的数据?
I've got a file tree containing a bunch of data I've generated. I've decided that at several stages of the data generation, I'd like to try some different configuration options in the programs that get used.
My solution was duplicating the data tree, and symlinking to all of the original data (multiplied by the number of new tests that I'm running). Then I'd let the programs clobber away the symlinks as needed. The result would be symlinks to the original tree for data that didn't get affected by my new configurations, and real data for anything new.
The problem is that the -clobber
option on most of the programs I use follow symlinks, so it in fact clobbered over my original data. Is there anything I could try (maybe something like bash environment settings?) that might make all of these programs clobber the actual symlink, rather than the data it points at?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
这可能是不可能的 - 该选项可能通过简单地以“截断”模式打开文件来工作 - 要替换符号链接,它实际上必须事先使用单独的函数删除文件[符号链接]。您可以尝试在原始数据文件上设置权限(
chmod -w
)以不允许写入,但这可能会阻止其工作This is probably not possible - the option likely works by simply opening the file in 'truncate' mode - to replace a symlink it would actually have to delete the file [the symlink] beforehand with a separate function. You could try setting the permissions (
chmod -w
) on the original data file to not allow writing, but that might simply stop it from working程序在写入之前是否检查新输出是否与旧输出相同?如果不是,那么符号链接似乎不会提供任何优势,因为数据存储在运行分析过程中始终是唯一的。
您能够更改程序写入输出的方式吗?以下序列将避免遵循符号链接:
out.tmp
out
和out.tmp
相同, ,删除out.tmp
out.tmp
移至out
之上数据有多大?它是否足够大以至于值得努力优化存储需求?您始终可以保存完整的输出,并在事后运行一些分析(在最简单的情况下,
diff
)以查看数据是否相同。如果有很多 GB 的数据,您可能需要查看一个可以为您优化重复数据的文件系统(该功能称为“重复数据删除”)。或者,您可以使用 LVM 的快照支持,它允许廉价的写时复制文件系统的快照。
作为一个穷人的去重文件系统,你可以在 bash 中做这样的事情:
Do the programs check whether the new output is the same as the old before writing? If not, then it doesn't seem like the symlinks will offer any advantage, since the datastore will always uniquified in the course of running the analysis.
Are you able to change the way that the programs write their output? The following sequence will avoid following the symlink:
out.tmp
out
andout.tmp
are identical, deleteout.tmp
out.tmp
on top ofout
How big is the data? Is it big enough that it's worth the effort to optimize the storage requirement? You can always save the full output, and run some analysis (in the simplest case,
diff
) after the fact to see if the data is the same.If it is many GBs of data, you may want to look in to a filesystem that will optimize the duplicate data for you (the feature is known as "de-duplication"). Or, you can use LVM's snapshot support, which allows cheap copy-on-write snapshots of a filesystem.
As a poor-man's de-duplicating file system, you can do something like this in bash: